Agentic Harness Design

Patterns for building multi-agent systems that produce high-quality outputs on long, complex tasks—covering generator/evaluator loops, context management, and task decomposition.

Core Architecture: Planner → Generator → Evaluator

Three-agent split addresses the two main failure modes of solo agents:

Context degradation — models lose coherence as the context window fills; some exhibit "context anxiety" and wrap up prematurely.
Self-evaluation bias — agents reliably over-praise their own output; separating producer from judge is the key lever.

Planner takes a short user prompt (1–4 sentences) and expands it into a full product spec. Keep it at the level of deliverables and high-level architecture—not granular implementation details, which cascade errors downstream. Ask the planner to identify opportunities to weave AI-native features into the spec.

Generator implements against the spec. Works in sprints (one feature at a time) when the model needs scaffolding. Stronger models can run as a single continuous session with SDK-level compaction handling context growth. Self-evaluates at the end of each sprint before handoff.

Evaluator grades the generator's output against agreed criteria. Uses a live browser tool (e.g. Playwright MCP) to interact with the running app rather than scoring static screenshots. Produces specific, actionable findings.

agentic-harness-design

Agentic Harness Design

Core Architecture: Planner → Generator → Evaluator

Sprint Contracts

More from editframe/skills

video-analysis

visual-thinking

css-animations

threejs-compositions

editor-gui

elements-new-package