ai-llm-inference
LLMOps - Inference & Optimization - Production Skill Hub
Modern Best Practices (January 2026):
- Treat inference as a systems problem: SLOs, tail latency, retries, overload, and cache strategy.
- Use continuous batching / smart scheduling when serving many concurrent requests (Orca scheduling: https://www.usenix.org/conference/osdi22/presentation/yu).
- Use KV-cache aware serving (PagedAttention/vLLM: https://arxiv.org/abs/2309.06180) and efficient attention kernels (FlashAttention: https://arxiv.org/abs/2205.14135).
- Use speculative decoding when latency is critical and draft-model quality is acceptable (speculative decoding: https://arxiv.org/abs/2302.01318).
- Quantize only with measured quality impact and rollback plan (quantization must be validated on your eval set).
This skill provides production-ready operational patterns for optimizing LLM inference performance, cost, and reliability. It centralizes decision rules, optimization strategies, configuration templates, and operational checklists for inference workloads.
No theory. No narrative. Only what Codex can execute.
When to Use This Skill
Codex should activate this skill whenever the user asks for:
More from vasilyu1983/ai-agents-public
product-management
Founder-PM toolkit for discovery, roadmaps, prioritization, and PMF measurement. Use when planning product strategy, metrics, or roadmaps.
684software-architecture-design
Designs system structure across monolith/microservices/serverless. Use when structuring systems, scaling, decomposing monoliths, or choosing patterns.
519software-ui-ux-design
Designs and audits UI/UX with WCAG 2.2 accessibility. Use when designing flows, running heuristic reviews, or defining design systems.
383qa-testing-playwright
E2E web testing with Playwright. Use when writing tests, debugging flakes, or setting up CI with selectors, sharding, and network mocking.
371document-pdf
Extract text/tables from PDFs, create formatted PDFs, merge/split/rotate, and handle forms. Use for any PDF generation or parsing task.
320qa-testing-strategy
Risk-based test strategy for software delivery. Use when defining coverage, setting CI gates, managing flaky tests, or establishing release criteria.
316