agent-evaluation

Installation
SKILL.md

Agent Evaluation

Overview

LLM-as-judge evaluation framework that scores AI-generated content on 5 dimensions using a 1-5 rubric. Agents evaluate outputs, compute a weighted composite score, and emit a structured verdict with evidence citations.

Core principle: Systematic quality verification before claiming completion. Agent-studio currently has no way to verify agent output quality — this skill fills that gap.

When to Use

Always:

  • Before marking a task complete (pair with verification-before-completion)
  • After a plan is generated (evaluate plan quality)
  • After code review outputs (evaluate review quality)
  • During reflection cycles (evaluate agent responses)
  • When comparing multiple agent outputs

Don't Use:

Related skills
Installs
28
GitHub Stars
27
First Seen
Feb 25, 2026