agent-evals

Maps AI integration points, scores missing loop mechanics, scaffolds the smallest useful eval loop. 6/6 means ready to improve; autonomy starts only when a controller repeats the loop.

When to Use

User wants evals, prompt optimization, trace replay, production loops, benchmarks, or /agent-evals.
Workspace has hardcoded prompts, raw rules files, unmonitored agent loops, no golden set, or trace data that does not feed improvement.
Part of the agent-experience discipline — the instrument-the-loop arm; agent-experience routes here to build eval/optimization loops.

The AI Optimization Staircase

Installs

Repository

thulr/informed-skills

First Seen

Jun 4, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykFail