Cost Benchmark

Runs scripts/bench.mjs against the structural+adversarial corpus and writes per-case + summary results to docs/benchmarks/runs/. This is the verification gate that backs every measurable claim in cost-booster-edit / cost-booster-route.

When to use

Before publishing a release — verify booster win rate didn't regress.
After expanding bench/booster-corpus.json — confirm new cases route correctly.
When auditing a "claimed upstream" tag — flip it to "verified" once the bench supports it.
On a cost question ("is Sonnet 4.6 cheaper than Opus 4.7 for these tasks?") — re-run with BENCH_ANTHROPIC=1.

Steps

Run the bench from v3/ (where agent-booster resolves):

cost-benchmark

Cost Benchmark

When to use

Steps