cost-benchmark
Installation
SKILL.md
Cost Benchmark
Runs scripts/bench.mjs against the structural+adversarial corpus and writes per-case + summary results to docs/benchmarks/runs/. This is the verification gate that backs every measurable claim in cost-booster-edit / cost-booster-route.
When to use
- Before publishing a release — verify booster win rate didn't regress.
- After expanding
bench/booster-corpus.json— confirm new cases route correctly. - When auditing a "claimed upstream" tag — flip it to "verified" once the bench supports it.
- On a cost question ("is Sonnet 4.6 cheaper than Opus 4.7 for these tasks?") — re-run with
BENCH_ANTHROPIC=1.
Steps
-
Run the bench from
v3/(whereagent-boosterresolves):( cd v3 && node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # booster only — free, ~85 ms ( cd v3 && BENCH_LLM_BASELINE=1 node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # + Gemini 2.0 Flash (cheap) ( cd v3 && BENCH_LLM_BASELINE=1 BENCH_ANTHROPIC=1 \
Related skills
More from ruvnet/ruflo
agent-swarm
Agent skill for swarm - invoke with $agent-swarm
401agent-workflow
Agent skill for workflow - invoke with $agent-workflow
400workflow-automation
>
389agent-arch-system-design
Agent skill for arch-system-design - invoke with $agent-arch-system-design
384security-audit
>
374agent-architecture
Agent skill for architecture - invoke with $agent-architecture
353