llm-serving-patterns

Installation

SKILL.md

LLM Serving Patterns

When to Use This Skill

Use this skill when:

Designing LLM inference infrastructure
Choosing between serving frameworks (vLLM, TGI, TensorRT-LLM)
Implementing quantization for production deployment
Optimizing batching and throughput
Building streaming response systems
Scaling LLM deployments cost-effectively

Keywords: LLM serving, inference, vLLM, TGI, TensorRT-LLM, quantization, INT8, INT4, FP16, batching, continuous batching, streaming, SSE, WebSocket, KV cache, PagedAttention, speculative decoding

LLM Serving Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐

Related skills

More from melodic-software/claude-code-plugins

Installs

Repository

melodic-softwar…-plugins

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

llm-serving-patterns

LLM Serving Patterns

When to Use This Skill

LLM Serving Architecture Overview

More from melodic-software/claude-code-plugins

design-thinking

plantuml-syntax

system-prompt-engineering

architecture-documentation

data-modeling

resume-optimization