tbench
Installation
SKILL.md
Terminal-Bench Integration
This directory contains the mux agent adapter for Terminal-Bench 2.0, using Harbor as the evaluation harness.
Quick Start
When user asks to run a tbench, generally assume they mean in CI via workflow_dispatch.
# Run full benchmark suite
make benchmark-terminal
# Run specific tasks
make benchmark-terminal TB_TASK_NAMES="hello-world chess-best-move"
# Run with specific model and xhigh thinking
MUX_RUN_ARGS="--thinking xhigh" make benchmark-terminal TB_ARGS="--agent-kwarg model_name=anthropic/claude-opus-4-7"