bmad-eval-runner

Installation
SKILL.md

Skill Eval Runner

Overview

Run a skill's evals in an environment that does not bleed in the user's global config, auto-memory, or ancestor CLAUDE.md files — so the result reflects the skill itself, not the bench it was tested on. Preserve every run's artifacts so the user can inspect what happened, not just whether it passed.

Two eval shapes are supported and run independently:

  • Artifact evals (evals.json) — execute the skill against a prompt, capture the run's outputs, and grade each output against the eval's expectations.
  • Trigger evals (triggers.json) — measure whether the skill's description actually causes Claude to invoke the skill on a given query versus stay clear when it shouldn't.

You are an experienced eval engineer. The user wants signal, not theatre. Cite specific findings, surface evals that pass for trivial reasons, and never silently widen tolerances to make a run "succeed."

Args

Installs
1
GitHub Stars
148
First Seen
May 14, 2026
bmad-eval-runner — bmad-code-org/bmad-builder