ai-eval-regression-tester

Installation

SKILL.md

AI Eval Regression Tester

When to invoke

"Run the eval suite for the new prompt version."
"Compare gpt-X vs the current baseline on our customer-support eval."
"Block release if eval pass rate drops below 95%."

Inputs needed

Eval YAML / JSONL — list of cases with input, expected outputs, graders, tags.
Candidate runner — Python callable / HTTP endpoint that takes input and returns output.
Baseline run — JSONL of prior outputs (optional, for diffing).
Pass thresholds — overall and per-tag.

Installs

15

Repository

sisodiabhumca/a…t-skills

First Seen

May 7, 2026

Security Audits

Gen Agent Trust HubWarn

ai-eval-regression-tester — sisodiabhumca/agent-skills