llm-serving-patterns

Installation
SKILL.md

LLM Serving Patterns

When to Use This Skill

Use this skill when:

  • Designing LLM inference infrastructure
  • Choosing between serving frameworks (vLLM, TGI, TensorRT-LLM)
  • Implementing quantization for production deployment
  • Optimizing batching and throughput
  • Building streaming response systems
  • Scaling LLM deployments cost-effectively

Keywords: LLM serving, inference, vLLM, TGI, TensorRT-LLM, quantization, INT8, INT4, FP16, batching, continuous batching, streaming, SSE, WebSocket, KV cache, PagedAttention, speculative decoding

LLM Serving Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
Related skills
Installs
7
GitHub Stars
63
First Seen
Jan 24, 2026