agentbench

Installation
SKILL.md

AgentBench for OpenClaw

Benchmark your OpenClaw agent's general capabilities across 40 real-world tasks spanning 7 domains.

Commands

When the user says any of these, follow the corresponding instructions:

  • /benchmark — Run the full benchmark suite (all 40 tasks)
  • /benchmark --fast — Run only easy+medium tasks (19 tasks)
  • /benchmark --suite <name> — Run one domain only
  • /benchmark --task <id> — Run a single task
  • /benchmark --strict — Tag results as externally verified scoring
  • /benchmark-list — List all tasks grouped by domain
  • /benchmark-results — Show results from previous runs
  • /benchmark-compare — Compare two runs side-by-side

Flags are combinable: /benchmark --fast --suite research

Installs
1
Repository
openclaw/skills
GitHub Stars
4.5K
First Seen
Mar 15, 2026
agentbench — openclaw/skills