project-evals

Installation
SKILL.md

Stage 3: Evaluating guidance for a use case (Needs evals)

This is the third of three stages in creating guidance:

  1. Stage 1: Identifying use cases for a feature
  2. Stage 2: Authoring guidance for a use case
  3. Stage 3: Evaluating guidance for a use case (you are here)

What the eval agent sees vs real-world agents

Real-world coding agents see only guide.md — retrieved automatically via the RAG skills system when a developer asks for help. Every other file in a use case directory is eval infrastructure.

The eval harness runs a separate coding agent in a controlled environment to test whether the guidance works. This eval agent receives the first prompt from tasks/task.md and has access to guide.md via the same RAG system. The harness then runs grader.ts against the eval agent's output.

None of the following are ever seen by real-world coding agents:

Installs
2
GitHub Stars
740
First Seen
May 28, 2026
project-evals — googlechrome/modern-web-guidance-src