agent-evaluation
Installation
SKILL.md
Agent Evaluation
Overview
Core principle: Agents are non-deterministic. Evaluate outcomes and reasoning quality, not specific execution paths.
Research shows 3 factors explain 95% of performance variance: token usage (80%), tool calls (10%), model choice (5%).
When to Use
- After creating a new skill
- Before deploying an agent to production
- When agent behavior is inconsistent
- For
/qa-reviewof AI-assisted work - Comparing approaches or models