agent-evals
Installation
SKILL.md
agent-evals
Maps AI integration points, scores missing loop mechanics, scaffolds the smallest useful eval loop. 6/6 means ready to improve; autonomy starts only when a controller repeats the loop.
When to Use
- User wants evals, prompt optimization, trace replay, production loops, benchmarks, or
/agent-evals. - Workspace has hardcoded prompts, raw rules files, unmonitored agent loops, no golden set, or trace data that does not feed improvement.
- Part of the agent-experience discipline — the instrument-the-loop arm;
agent-experienceroutes here to build eval/optimization loops.