Stage 3: Evaluating guidance for a use case (Needs evals)

This is the third of three stages in creating guidance:

Stage 1: Identifying use cases for a feature
Stage 2: Authoring guidance for a use case
Stage 3: Evaluating guidance for a use case (you are here)

What the eval agent sees vs real-world agents

Real-world coding agents see only guide.md — retrieved automatically via the RAG skills system when a developer asks for help. Every other file in a use case directory is eval infrastructure.

The eval harness runs a separate coding agent in a controlled environment to test whether the guidance works. This eval agent receives the first prompt from tasks/task.md and has access to guide.md via the same RAG system. The harness then runs grader.ts against the eval agent's output.

None of the following are ever seen by real-world coding agents:

project-evals

Stage 3: Evaluating guidance for a use case (Needs evals)

What the eval agent sees vs real-world agents