LLM Development & Engineering — Complete Reference

Build, evaluate, and deploy LLM systems with modern production standards.

This skill covers the full LLM lifecycle:

Development: Strategy selection, dataset design, instruction tuning, PEFT/LoRA fine-tuning
Evaluation: Automated testing, LLM-as-judge, metrics, rollout gates
Deployment: Serving handoff, latency/cost budgeting, reliability patterns (see ai-llm-inference)
Operations: Quality monitoring, change management, incident response (see ai-mlops)
Safety: Threat modeling, data governance, layered mitigations (NIST AI RMF: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf)

Modern Best Practices (2026):

Treat the model as a component with contracts, budgets, and rollback plans (not "magic").
Separate core concepts (tokenization, context, training vs adaptation) from implementation choices (providers, SDKs).
Gate upgrades with repeatable evals and staged rollout; avoid blind model swaps.
Cost-aware engineering: Measure cost per successful outcome, not just cost per token; design tiering/caching early.
Security-by-design: Threat model prompt injection, data leakage, and tool abuse; treat guardrails as production code.

ai-llm