eval-harness

Installation
SKILL.md

Eval Harness Skill

A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.

When to Activate

  • Setting up eval-driven development (EDD) for AI-assisted workflows
  • Defining pass/fail criteria for Claude Code task completion
  • Measuring agent reliability with pass@k metrics
  • Creating regression test suites for prompt or agent changes
  • Benchmarking agent performance across model versions

Philosophy

Eval-Driven Development treats evals as the "unit tests of AI development":

  • Define expected behavior BEFORE implementation
  • Run evals continuously during development
  • Track regressions with each change
  • Use pass@k metrics for reliability measurement
Related skills
Installs
2
GitHub Stars
1
First Seen
Mar 4, 2026