dec-bench-evals
Installation
SKILL.md
DEC Bench Evals
Use this skill when the user wants to create or extend a DEC Bench scenario. The goal is a deterministic, runnable evaluation, not a vague benchmark idea.
Quick Start
Default authoring loop:
dec-bench create --name <id> --domain <domain> --tier <tier>
dec-bench validate --scenario <id>
dec-bench build --scenario <id> --harness <harness> --agent <agent> --model <model> --version <version>
dec-bench run --scenario <id> --harness <harness> --persona naive --mode no-plan
dec-bench results --latest --scenario <id>
dec-bench audit open --scenario <id> --run-id <run-id>
dec-bench registry add --scenario scenarios/<id>
dec-bench registry publish --id <id>