agent-evaluation
Installation
SKILL.md
Agent Evaluation
Overview
LLM-as-judge evaluation framework that scores AI-generated content on 5 dimensions using a 1-5 rubric. Agents evaluate outputs, compute a weighted composite score, and emit a structured verdict with evidence citations.
Core principle: Systematic quality verification before claiming completion. Agent-studio currently has no way to verify agent output quality — this skill fills that gap.
When to Use
Always:
- Before marking a task complete (pair with
verification-before-completion) - After a plan is generated (evaluate plan quality)
- After code review outputs (evaluate review quality)
- During reflection cycles (evaluate agent responses)
- When comparing multiple agent outputs
Don't Use: