agent-evaluation

Installation
SKILL.md

Agent Evaluation

Overview

Core principle: Agents are non-deterministic. Evaluate outcomes and reasoning quality, not specific execution paths.

Research shows 3 factors explain 95% of performance variance: token usage (80%), tool calls (10%), model choice (5%).

When to Use

  • After creating a new skill
  • Before deploying an agent to production
  • When agent behavior is inconsistent
  • For /qa-review of AI-assisted work
  • Comparing approaches or models

Quick Reference: 5-Dimension Rubric

Related skills

More from guia-matthieu/clawfu-skills

Installs
89
GitHub Stars
108
First Seen
Feb 13, 2026