project-evals
Installation
SKILL.md
Stage 3: Evaluating guidance for a use case (Needs evals)
This is the third of three stages in creating guidance:
- Stage 1: Identifying use cases for a feature
- Stage 2: Authoring guidance for a use case
- Stage 3: Evaluating guidance for a use case (you are here)
What the eval agent sees vs real-world agents
Real-world coding agents see only guide.md — retrieved automatically via the RAG skills system when a developer asks for help. Every other file in a use case directory is eval infrastructure.
The eval harness runs a separate coding agent in a controlled environment to test whether the guidance works. This eval agent receives the first prompt from tasks/task.md and has access to guide.md via the same RAG system. The harness then runs grader.ts against the eval agent's output.
None of the following are ever seen by real-world coding agents: