club-3090-llm-serving
club-3090 LLM Serving
Skill by ara.so — Daily 2026 Skills collection.
Community recipes for serving modern LLMs on RTX 3090 (24 GB) hardware. Supports vLLM, llama.cpp, and SGLang engines with validated Docker Compose configs exposing an OpenAI-compatible API on localhost:8020. Currently ships Qwen3.6-27B configs for 1× and 2× cards.
Engine Decision Matrix
| Need | Engine | Why |
|---|---|---|
| Max throughput (code/chat) | vLLM dual | 89–127 TPS, MTP n=3, vision, tools |
| Full 262K context, no crashes | llama.cpp single | No prefill cliffs, stable tool-use |
| 4 concurrent streams @ 262K | vLLM dual turbo | Stream isolation, full feature stack |
| Single card, moderate ctx | vLLM default | ~89 TPS, easiest setup |
SGLang is currently blocked on Qwen3.6-27B — see models/qwen3.6-27b/sglang/README.md.
More from aradotso/trending-skills
openclaw-control-center
Local-first, security-first control center for OpenClaw agents — visibility dashboard with readonly defaults, token attribution, collaboration tracing, and safe write operations.
3.9Kinkos-multi-agent-novel-writing
Multi-agent CLI system for autonomous novel writing, auditing, and revision with human review gates
1.8Keverything-claude-code-harness
Agent harness performance system for Claude Code and other AI coding agents — skills, instincts, memory, hooks, commands, and security scanning
1.6Kagency-agents-ai-specialists
A collection of specialized AI agent personalities for Claude Code, Cursor, Aider, Windsurf, and other AI coding tools — covering engineering, design, marketing, sales, and more.
1.6Kunderstand-anything-knowledge-graph
Turn any codebase into an interactive knowledge graph using Claude Code skills — explore, search, and ask questions about any project visually.
1.5Kui-ux-pro-max-skill
AI design intelligence skill for building professional UI/UX across multiple platforms with 161 reasoning rules, 67 styles, and automated design system generation
1.5K