agent-evaluation

Installation
SKILL.md

Agent Evaluation (AI Agent Evals)

Based on Anthropic's "Demystifying evals for AI agents"

When to use this skill

  • Designing evaluation systems for AI agents
  • Building benchmarks for coding, conversational, or research agents
  • Creating graders (code-based, model-based, human)
  • Implementing production monitoring for AI systems
  • Setting up CI/CD pipelines with automated evals
  • Debugging agent performance issues
  • Measuring agent improvement over time

Core Concepts

Eval Evolution: Single-turn → Multi-turn → Agentic

Related skills

More from jeo-tech-ai/oh-my-gods

Installs
2
GitHub Stars
3
First Seen
Mar 11, 2026