skill-ab-test
Installation
SKILL.md
Skill A/B Test
Compare a modified skill against its baseline to measure whether changes actually improve agent behavior. Runs controlled tests with both versions, grades outputs against assertions, and produces a benchmark report.
When to Use
- After modifying a SKILL.md, references, or scripts
- Before committing skill changes to verify improvement
- When unsure if added instructions actually change behavior
- To identify regressions from refactoring
Prerequisites
- The skill must be in a git repo (baseline comes from the last commit)
- Node.js available (for the report generator)
- Changes should already be applied to the skill files (working tree = new version, git HEAD = baseline)