agent-evaluation

Installation

SKILL.md

Agent Evaluation (AI Agent Evals)

Based on Anthropic's "Demystifying evals for AI agents"

When to use this skill

Designing evaluation systems for AI agents
Building benchmarks for coding, conversational, or research agents
Creating graders (code-based, model-based, human)
Implementing production monitoring for AI systems
Setting up CI/CD pipelines with automated evals
Debugging agent performance issues
Measuring agent improvement over time

Core Concepts

Eval Evolution: Single-turn → Multi-turn → Agentic

Related skills

More from akillness/skills-template

Installs

18

Repository

akillness/skill…template

GitHub Stars

14

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubPass