club-3090-llm-serving

Installation

SKILL.md

club-3090 LLM Serving

Skill by ara.so — Daily 2026 Skills collection.

Community recipes for serving modern LLMs on RTX 3090 (24 GB) hardware. Supports vLLM, llama.cpp, and SGLang engines with validated Docker Compose configs exposing an OpenAI-compatible API on localhost:8020. Currently ships Qwen3.6-27B configs for 1× and 2× cards.

Engine Decision Matrix

Need	Engine	Why
Max throughput (code/chat)	vLLM dual	89–127 TPS, MTP n=3, vision, tools
Full 262K context, no crashes	llama.cpp single	No prefill cliffs, stable tool-use
4 concurrent streams @ 262K	vLLM dual turbo	Stream isolation, full feature stack
Single card, moderate ctx	vLLM default	~89 TPS, easiest setup

SGLang is currently blocked on Qwen3.6-27B — see models/qwen3.6-27b/sglang/README.md.

Related skills

club-3090-llm-serving

club-3090 LLM Serving

Engine Decision Matrix

More from aradotso/trending-skills

openclaw-control-center

inkos-multi-agent-novel-writing

everything-claude-code-harness

agency-agents-ai-specialists

understand-anything-knowledge-graph

ui-ux-pro-max-skill