eval-harness

Installation
SKILL.md

Eval Harness

Design Philosophy

Evaluation-Driven Development

Evaluation-driven development (EDD) is a methodology where evaluations are defined before or alongside implementation, ensuring that success criteria are explicit, measurable, and testable from the start.

Core Principles:

  1. Define Success First: Before implementing a feature, define what "working correctly" means through explicit evaluations
  2. Measure Continuously: Run evals throughout development, not just at the end
  3. Automate Where Possible: Prefer automated graders for speed and consistency
  4. Human Review for Nuance: Use human graders when quality judgments require context or subjectivity
  5. Track Regressions: Every capability added should become a regression test
  6. Iterate on Failures: Failed evals provide specific, actionable feedback for improvement

Benefits of EDD:

Related skills

More from mhylle/claude-skills-collection

Installs
6
GitHub Stars
13
First Seen
Jan 30, 2026