skill-eval
Installation
SKILL.md
skill-eval
Re-run baseline evaluations on one or more skills. Uses the evals.json test definitions committed in each skill, dispatches pressure scenarios via subagents, saves transcripts to a gitignored workspace, and grades the runs deterministically.
When to use
Verbatim trigger phrases:
- "rerun the baselines"
- "re-eval skill X"
- "test all the skills"
- "check for skill drift"
- "run the evals"
- "did skill X still pass"
When NOT to use
Related skills