skill-ab-test

Installation
SKILL.md

Skill A/B Test

Compare a modified skill against its baseline to measure whether changes actually improve agent behavior. Runs controlled tests with both versions, grades outputs against assertions, and produces a benchmark report.

When to Use

  • After modifying a SKILL.md, references, or scripts
  • Before committing skill changes to verify improvement
  • When unsure if added instructions actually change behavior
  • To identify regressions from refactoring

Prerequisites

  • The skill must be in a git repo (baseline comes from the last commit)
  • Node.js available (for the report generator)
  • Changes should already be applied to the skill files (working tree = new version, git HEAD = baseline)

Installs
1
Repository
vltansky/skills
GitHub Stars
9
First Seen
Mar 14, 2026
skill-ab-test — vltansky/skills