eval-harness

Installation
SKILL.md

Eval Harness

Formal evaluation framework implementing eval-driven development (EDD) — treating evals as unit tests for AI development.

When to Activate

  • Setting up eval-driven development for AI workflows
  • Defining pass/fail criteria for task completion
  • Measuring agent reliability with pass@k metrics
  • Creating regression test suites for prompt/agent changes

Philosophy

  • Define expected behavior BEFORE implementation
  • Run evals continuously during development
  • Track regressions with each change
  • Use pass@k metrics for reliability measurement

Eval Types

Related skills

More from xbklairith/kisune

Installs
5
GitHub Stars
2
First Seen
Mar 23, 2026