agentbench

Installation

SKILL.md

AgentBench for OpenClaw

Benchmark your OpenClaw agent's general capabilities across 40 real-world tasks spanning 7 domains.

Commands

When the user says any of these, follow the corresponding instructions:

/benchmark — Run the full benchmark suite (all 40 tasks)
/benchmark --fast — Run only easy+medium tasks (19 tasks)
/benchmark --suite <name> — Run one domain only
/benchmark --task <id> — Run a single task
/benchmark --strict — Tag results as externally verified scoring
/benchmark-list — List all tasks grouped by domain
/benchmark-results — Show results from previous runs
/benchmark-compare — Compare two runs side-by-side

Flags are combinable: /benchmark --fast --suite research

Installs

1

Repository

openclaw/skills

GitHub Stars

4.5K

First Seen

Mar 15, 2026

Security Audits

Gen Agent Trust HubPass

agentbench — openclaw/skills