model-serving

Installation
SKILL.md

Model Serving

Purpose

Deploy LLM and ML models for production inference with optimized serving engines, streaming response patterns, and orchestration frameworks. Focuses on self-hosted model serving, GPU optimization, and integration with frontend applications.

When to Use

  • Deploying LLMs for production (self-hosted Llama, Mistral, Qwen)
  • Building AI APIs with streaming responses
  • Serving traditional ML models (scikit-learn, XGBoost, PyTorch)
  • Implementing RAG pipelines with vector databases
  • Optimizing inference throughput and latency
  • Integrating LLM serving with frontend chat interfaces

Model Serving Selection

LLM Serving Engines

Related skills

More from ancoleman/ai-design-components

Installs
20
GitHub Stars
361
First Seen
Jan 25, 2026