agent-evaluation

Installation

SKILL.md

Agent Evaluation

Overview

Core principle: Agents are non-deterministic. Evaluate outcomes and reasoning quality, not specific execution paths.

Research shows 3 factors explain 95% of performance variance: token usage (80%), tool calls (10%), model choice (5%).

When to Use

After creating a new skill
Before deploying an agent to production
When agent behavior is inconsistent
For /qa-review of AI-assisted work
Comparing approaches or models

Quick Reference: 5-Dimension Rubric

Installs

137

Repository

guia-matthieu/c…u-skills

GitHub Stars

131

First Seen

Feb 13, 2026

Security Audits

Gen Agent Trust HubPass

agent-evaluation — guia-matthieu/clawfu-skills