agent:eval

Installation

SKILL.md

Agent Evaluation System

Guides the user through building a comprehensive evaluation system for their AI agent. Applies patterns 10-17 from "Patterns for Building AI Agents" (Bhagwat & Gienow, 2025): failure mode taxonomy, business metrics, cross-referencing, iterating against evals, test suites, SME labeling, production datasets, and live evaluation.

When to use

Use this skill when the user needs to:

Define what "good" looks like for an AI agent
Create a failure mode taxonomy
Set up business metrics for agent performance
Build an evaluation test suite
Design SME labeling workflows
Plan production data evaluation pipelines

Instructions

Step 1: Understand the Agent

Use the AskUserQuestion tool to gather context:

Related skills

More from ikatsuba/skills

Installs

6

Repository

ikatsuba/skills

First Seen

Mar 6, 2026