eval-harness

Installation

SKILL.md

Eval Harness

Formal evaluation framework implementing eval-driven development (EDD) — treating evals as unit tests for AI development.

When to Activate

Setting up eval-driven development for AI workflows
Defining pass/fail criteria for task completion
Measuring agent reliability with pass@k metrics
Creating regression test suites for prompt/agent changes

Philosophy

Define expected behavior BEFORE implementation
Run evals continuously during development
Track regressions with each change
Use pass@k metrics for reliability measurement

Eval Types

Installs

5

Repository

xbklairith/kisune

GitHub Stars

2

First Seen

Mar 23, 2026

Security Audits

Gen Agent Trust HubPass

eval-harness — xbklairith/kisune