agentbench
Installation
SKILL.md
AgentBench for OpenClaw
Benchmark your OpenClaw agent's general capabilities across 40 real-world tasks spanning 7 domains.
Commands
When the user says any of these, follow the corresponding instructions:
/benchmark— Run the full benchmark suite (all 40 tasks)/benchmark --fast— Run only easy+medium tasks (19 tasks)/benchmark --suite <name>— Run one domain only/benchmark --task <id>— Run a single task/benchmark --strict— Tag results as externally verified scoring/benchmark-list— List all tasks grouped by domain/benchmark-results— Show results from previous runs/benchmark-compare— Compare two runs side-by-side
Flags are combinable: /benchmark --fast --suite research