model-serving

Installation

SKILL.md

Model Serving

Purpose

Deploy LLM and ML models for production inference with optimized serving engines, streaming response patterns, and orchestration frameworks. Focuses on self-hosted model serving, GPU optimization, and integration with frontend applications.

When to Use

Deploying LLMs for production (self-hosted Llama, Mistral, Qwen)
Building AI APIs with streaming responses
Serving traditional ML models (scikit-learn, XGBoost, PyTorch)
Implementing RAG pipelines with vector databases
Optimizing inference throughput and latency
Integrating LLM serving with frontend chat interfaces

Model Serving Selection

LLM Serving Engines

Related skills

More from ancoleman/ai-design-components

creating-dashboards
Creates comprehensive dashboard and analytics interfaces that combine data visualization, KPI cards, real-time updates, and interactive layouts. Use this skill when building business intelligence dashboards, monitoring systems, executive reports, or any interface that requires multiple coordinated data displays with filters, metrics, and visualizations working together.
245
implementing-drag-drop
Implements drag-and-drop and sortable interfaces with React/TypeScript including kanban boards, sortable lists, file uploads, and reorderable grids. Use when building interactive UIs requiring direct manipulation, spatial organization, or touch-friendly reordering.
164
administering-linux
Manage Linux systems covering systemd services, process management, filesystems, networking, performance tuning, and troubleshooting. Use when deploying applications, optimizing server performance, diagnosing production issues, or managing users and security on Linux servers.
127
security-hardening
Reduces attack surface across OS, container, cloud, network, and database layers using CIS Benchmarks and zero-trust principles. Use when hardening production infrastructure, meeting compliance requirements, or implementing defense-in-depth security.
109
building-ai-chat
Builds AI chat interfaces and conversational UI with streaming responses, context management, and multi-modal support. Use when creating ChatGPT-style interfaces, AI assistants, code copilots, or conversational agents. Handles streaming text, token limits, regeneration, feedback loops, tool usage visualization, and AI-specific error patterns. Provides battle-tested components from leading AI products with accessibility and performance built in.
74
designing-distributed-systems
When designing distributed systems for scalability, reliability, and consistency. Covers CAP/PACELC theorems, consistency models (strong, eventual, causal), replication patterns (leader-follower, multi-leader, leaderless), partitioning strategies (hash, range, geographic), transaction patterns (saga, event sourcing, CQRS), resilience patterns (circuit breaker, bulkhead), service discovery, and caching strategies for building fault-tolerant distributed architectures.
52

Installs

Repository

ancoleman/ai-de…mponents

GitHub Stars

361

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykWarn

model-serving

Model Serving

Purpose

When to Use

Model Serving Selection

LLM Serving Engines

More from ancoleman/ai-design-components

creating-dashboards

implementing-drag-drop

administering-linux

security-hardening

building-ai-chat

designing-distributed-systems