ai-eval-regression-tester
Installation
SKILL.md
AI Eval Regression Tester
When to invoke
- "Run the eval suite for the new prompt version."
- "Compare gpt-X vs the current baseline on our customer-support eval."
- "Block release if eval pass rate drops below 95%."
Inputs needed
- Eval YAML / JSONL — list of cases with
input, expected outputs, graders, tags. - Candidate runner — Python callable / HTTP endpoint that takes input and returns output.
- Baseline run — JSONL of prior outputs (optional, for diffing).
- Pass thresholds — overall and per-tag.