agent-engineering
Agent Engineering
Coverage
- The discipline's relationship to and distinction from prompt engineering, harness engineering, and traditional distributed systems
- The four pillars: architecture and lifecycle management, task decomposition and context management, multi-agent coordination patterns, production reliability
- The lifecycle state machine: claim → execute → verify → commit → release, with the extended research/plan/review variant for complex workflows
- Context health states (ok / degraded / compact / exhausted) and their budget thresholds, plus the six observable signals of context rot
- Multi-agent coordination patterns: orchestrator/worker, fan-out/merge, evaluator/optimiser, consensus/fusion, sequential chain, hybrid — and the cost/reliability trade-offs of each
- The two-pass pattern (audit then fresh-context implement) for reliability-critical workflows
- The eight named coordination failure modes (task stealing, context contamination, merge conflicts, silent stall, brief rot, result injection, context bloat, double-commit) with detection and mitigation
- The six production reliability requirements: observability, cost budgets, idempotency, failure recovery, safety caps, claim locks — and what breaks when each is missing
- The delegation decision framework: six gates with overhead crossover analysis (≈1000-token minimum subagent overhead), batch crossover at four tasks for cheap-model fan-out
- The most common anti-patterns (God Agent, prompt-as-architecture, memory-persisted state, runaway loop, telephone-game briefs, ghost claim) and corrective actions
- The production readiness audit checklist and the staged-rollout verification workflow (10% → 50% → 100% budget)
Philosophy
A single LLM prompt produces an answer. A system of LLMs produces a workflow that survives session boundaries, crashes, model variance, budget exhaustion, and adversarial input. Agent engineering is the discipline of building the second from the first.
More from jacob-balslev/skills
layout-composition
Use when deciding responsive page or screen structure: section order, scan pattern, grid/flex composition, breakpoints, viewport hierarchy, responsive media, and density. Do NOT use for user-goal decomposition (use `task-analysis`), navigation taxonomy (use `information-architecture`), visual polish (use `visual-design-foundations`), or component/token contracts (use `design-system-architecture`).
8context-graph
Use when designing or auditing the multi-graph context architecture of an AI-coding workspace: skill graph, document routing graph, memory index, script registry, and the cross-graph edges between them. Covers edge typing, orphan detection, connectivity health, deterministic graph synthesis signals, change-propagation checks, and drift or hub-and-spoke anti-patterns. Do NOT use for authoring one SKILL.md (use `skill-scaffold`), validating one skill (use `graph-audit`), live routing decisions (use `skill-router`), context-window budgeting (use `context-window`), or session load/drop choices (use `context-management`).
8visual-design-foundations
Use when designing or auditing visual craft: color palette, typography, spacing, elevation, rhythm, density, visual hierarchy, brand fit, contrast intent, and motion feel. Do NOT use for sign-system meaning (use `semiotics`), token/component architecture (use `design-system-architecture`), responsive structure (use `layout-composition`), or accessibility compliance (use `a11y`).
7project-knowledge-extraction
Use when extracting durable project knowledge from code, docs, issues, incidents, reports, screenshots, or conversations into reusable context such as skills, ADRs, glossaries, context docs, or memory. Do NOT use for writing a new skill contract (use `skill-scaffold`), maintaining library tooling (use `skill-infrastructure`), or generic documentation polish (use `documentation`).
6problem-framing
Use when a team is converging on solutions before agreeing on the problem, when a brief reads as a feature request, when symptoms and root needs are tangled, or when assumptions need surfacing before design work proceeds. Do NOT use for code-level bug triage, runtime failure diagnosis, or root-cause analysis of system errors — those are engineering investigation tasks, not design problem framing.
6ai-native-development
Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`).
6