ai-eval-regression-tester

Installation
SKILL.md

AI Eval Regression Tester

When to invoke

  • "Run the eval suite for the new prompt version."
  • "Compare gpt-X vs the current baseline on our customer-support eval."
  • "Block release if eval pass rate drops below 95%."

Inputs needed

  1. Eval YAML / JSONL — list of cases with input, expected outputs, graders, tags.
  2. Candidate runner — Python callable / HTTP endpoint that takes input and returns output.
  3. Baseline run — JSONL of prior outputs (optional, for diffing).
  4. Pass thresholds — overall and per-tag.
Installs
15
First Seen
May 7, 2026
ai-eval-regression-tester — sisodiabhumca/agent-skills