skill-eval

Installation

SKILL.md

skill-eval

Re-run baseline evaluations on one or more skills. Uses the evals.json test definitions committed in each skill, dispatches pressure scenarios via subagents, saves transcripts to a gitignored workspace, and grades the runs deterministically.

When to use

Verbatim trigger phrases:

"rerun the baselines"
"re-eval skill X"
"test all the skills"
"check for skill drift"
"run the evals"
"did skill X still pass"

When NOT to use

Installs

Repository

zrosenbauer/skills

GitHub Stars

First Seen

May 7, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

skill-eval — zrosenbauer/skills