cost-benchmark

Installation
SKILL.md

Cost Benchmark

Runs scripts/bench.mjs against the structural+adversarial corpus and writes per-case + summary results to docs/benchmarks/runs/. This is the verification gate that backs every measurable claim in cost-booster-edit / cost-booster-route.

When to use

  • Before publishing a release — verify booster win rate didn't regress.
  • After expanding bench/booster-corpus.json — confirm new cases route correctly.
  • When auditing a "claimed upstream" tag — flip it to "verified" once the bench supports it.
  • On a cost question ("is Sonnet 4.6 cheaper than Opus 4.7 for these tasks?") — re-run with BENCH_ANTHROPIC=1.

Steps

  1. Run the bench from v3/ (where agent-booster resolves):

    ( cd v3 && node ../plugins/ruflo-cost-tracker/scripts/bench.mjs )                  # booster only — free, ~85 ms
    ( cd v3 && BENCH_LLM_BASELINE=1 node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # + Gemini 2.0 Flash (cheap)
    ( cd v3 && BENCH_LLM_BASELINE=1 BENCH_ANTHROPIC=1 \
    
Related skills
Installs
83
Repository
ruvnet/ruflo
GitHub Stars
50.2K
First Seen
8 days ago