skill-eval

Installation
SKILL.md

skill-eval

Re-run baseline evaluations on one or more skills. Uses the evals.json test definitions committed in each skill, dispatches pressure scenarios via subagents, saves transcripts to a gitignored workspace, and grades the runs deterministically.

When to use

Verbatim trigger phrases:

  • "rerun the baselines"
  • "re-eval skill X"
  • "test all the skills"
  • "check for skill drift"
  • "run the evals"
  • "did skill X still pass"

When NOT to use

Related skills
Installs
8
GitHub Stars
1
First Seen
8 days ago