eval-harness

Originally fromaffaan-m/everything-claude-code

Installation

SKILL.md

Eval Harness Skill

A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.

When to Activate

Setting up eval-driven development (EDD) for AI-assisted workflows
Defining pass/fail criteria for Claude Code task completion
Measuring agent reliability with pass@k metrics
Creating regression test suites for prompt or agent changes
Benchmarking agent performance across model versions

Philosophy

Eval-Driven Development treats evals as the "unit tests of AI development":

Define expected behavior BEFORE implementation
Run evals continuously during development
Track regressions with each change
Use pass@k metrics for reliability measurement

Installs

1.3K

Repository

GitHub Stars

232.5K

First Seen

May 19, 2026

Security Audits

Gen Agent Trust HubPass

eval-harness — affaan-m/ecc