cost-benchmark

Originally fromruvnet/ruflo
Installation
SKILL.md

Cost Benchmark

Runs scripts/bench.mjs against the structural+adversarial corpus and writes per-case + summary results to docs/benchmarks/runs/. This is the verification gate that backs every measurable claim in cost-booster-edit / cost-booster-route.

When to use

  • Before publishing a release — verify booster win rate didn't regress.
  • After expanding bench/booster-corpus.json — confirm new cases route correctly.
  • When auditing a "claimed upstream" tag — flip it to "verified" once the bench supports it.
  • On a cost question ("is Sonnet 4.6 cheaper than Opus 4.7 for these tasks?") — re-run with BENCH_ANTHROPIC=1.

Steps

  1. Run the bench from v3/ (where agent-booster resolves):
Installs
49
GitHub Stars
61.6K
First Seen
May 8, 2026
cost-benchmark — ruvnet/claude-flow