eval-harness
Installation
SKILL.md
Eval Harness Skill
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
Philosophy
Eval-Driven Development treats evals as the "unit tests of AI development":
- Define expected behavior BEFORE implementation
- Run evals continuously during development
- Track regressions with each change
- Use pass@k metrics for reliability measurement