club-3090-llm-serving

Installation
SKILL.md

club-3090 LLM Serving

Skill by ara.so — Daily 2026 Skills collection.

Community recipes for serving modern LLMs on RTX 3090 (24 GB) hardware. Supports vLLM, llama.cpp, and SGLang engines with validated Docker Compose configs exposing an OpenAI-compatible API on localhost:8020. Currently ships Qwen3.6-27B configs for 1× and 2× cards.


Engine Decision Matrix

Need Engine Why
Max throughput (code/chat) vLLM dual 89–127 TPS, MTP n=3, vision, tools
Full 262K context, no crashes llama.cpp single No prefill cliffs, stable tool-use
4 concurrent streams @ 262K vLLM dual turbo Stream isolation, full feature stack
Single card, moderate ctx vLLM default ~89 TPS, easiest setup

SGLang is currently blocked on Qwen3.6-27B — see models/qwen3.6-27b/sglang/README.md.

Related skills
Installs
46
GitHub Stars
4
First Seen
12 days ago