ai-llm
LLM Development & Engineering — Complete Reference
Build, evaluate, and deploy LLM systems with modern production standards.
This skill covers the full LLM lifecycle:
- Development: Strategy selection, dataset design, instruction tuning, PEFT/LoRA fine-tuning
- Evaluation: Automated testing, LLM-as-judge, metrics, rollout gates
- Deployment: Serving handoff, latency/cost budgeting, reliability patterns (see
ai-llm-inference) - Operations: Quality monitoring, change management, incident response (see
ai-mlops) - Safety: Threat modeling, data governance, layered mitigations (NIST AI RMF: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf)
Modern Best Practices (2026):
- Treat the model as a component with contracts, budgets, and rollback plans (not "magic").
- Separate core concepts (tokenization, context, training vs adaptation) from implementation choices (providers, SDKs).
- Gate upgrades with repeatable evals and staged rollout; avoid blind model swaps.
- Cost-aware engineering: Measure cost per successful outcome, not just cost per token; design tiering/caching early.
- Security-by-design: Threat model prompt injection, data leakage, and tool abuse; treat guardrails as production code.
More from vasilyu1983/ai-agents-public
product-management
Founder-PM toolkit for discovery, roadmaps, prioritization, and PMF measurement. Use when planning product strategy, metrics, or roadmaps.
684software-architecture-design
Designs system structure across monolith/microservices/serverless. Use when structuring systems, scaling, decomposing monoliths, or choosing patterns.
519software-ui-ux-design
Designs and audits UI/UX with WCAG 2.2 accessibility. Use when designing flows, running heuristic reviews, or defining design systems.
383qa-testing-playwright
E2E web testing with Playwright. Use when writing tests, debugging flakes, or setting up CI with selectors, sharding, and network mocking.
372document-pdf
Extract text/tables from PDFs, create formatted PDFs, merge/split/rotate, and handle forms. Use for any PDF generation or parsing task.
327qa-testing-strategy
Risk-based test strategy for software delivery. Use when defining coverage, setting CI gates, managing flaky tests, or establishing release criteria.
317