agent-evaluation

Installation
SKILL.md

Agent Evaluation

Use this skill when the work is deciding how an AI agent should be measured, not when the work is simply building the feature itself.

Read references/grader-selection.md when you need help picking grader types, benchmark families, or score dimensions for a specific agent surface.

Read references/ops-and-calibration.md when you need harness design, transcript review, CI gates, sampling policy, saturation checks, or production monitoring guidance.

When to use this skill

Related skills
Installs
12
GitHub Stars
2
First Seen
Mar 11, 2026