agent-evals

Installation
SKILL.md

agent-evals

Maps AI integration points, scores missing loop mechanics, scaffolds the smallest useful eval loop. 6/6 means ready to improve; autonomy starts only when a controller repeats the loop.

When to Use

  • User wants evals, prompt optimization, trace replay, production loops, benchmarks, or /agent-evals.
  • Workspace has hardcoded prompts, raw rules files, unmonitored agent loops, no golden set, or trace data that does not feed improvement.
  • Part of the agent-experience discipline — the instrument-the-loop arm; agent-experience routes here to build eval/optimization loops.

The AI Optimization Staircase

Installs
8
First Seen
Jun 4, 2026
agent-evals — thulr/informed-skills