Agent Engineering

Concept of the skill

Agent engineering treats LLM calls as unreliable, tool-using components inside a larger workflow.

Coverage

The discipline's relationship to and distinction from prompt engineering, harness engineering, and traditional distributed systems
The four pillars: architecture and lifecycle management, task decomposition and context management, multi-agent coordination patterns, production reliability
The lifecycle state machine: claim → execute → verify → commit → release, with the extended research/plan/review variant for complex workflows
Context health states (ok / degraded / compact / exhausted) and their budget thresholds, plus the six observable signals of context rot
Multi-agent coordination patterns: orchestrator/worker, fan-out/merge, evaluator/optimiser, consensus/fusion, sequential chain, hybrid — and the cost/reliability trade-offs of each
The two-pass pattern (audit then fresh-context implement) for reliability-critical workflows
The eight named coordination failure modes (task stealing, context contamination, merge conflicts, silent stall, brief rot, result injection, context bloat, double-commit) with detection and mitigation
The six production reliability requirements: observability, cost budgets, idempotency, failure recovery, safety caps, claim locks — and what breaks when each is missing
The delegation decision framework: six gates with overhead crossover analysis (≈1000-token minimum subagent overhead), batch crossover at four tasks for cheap-model fan-out
The most common anti-patterns (God Agent, prompt-as-architecture, memory-persisted state, runaway loop, telephone-game briefs, ghost claim) and corrective actions
The production readiness audit checklist and the staged-rollout verification workflow (10% → 50% → 100% budget)

agent-engineering

Agent Engineering

Concept of the skill

Coverage