agent-evaluation

Installation
SKILL.md

Agent Evaluation (AI Agent Evals)

Based on Anthropic's "Demystifying evals for AI agents"

When to use this skill

  • Designing evaluation systems for AI agents
  • Building benchmarks for coding, conversational, or research agents
  • Creating graders (code-based, model-based, human)
  • Implementing production monitoring for AI systems
  • Setting up CI/CD pipelines with automated evals
  • Debugging agent performance issues
  • Measuring agent improvement over time

Core Concepts

Eval Evolution: Single-turn → Multi-turn → Agentic

Related skills
Installs
18
GitHub Stars
14
First Seen
Mar 6, 2026