tbench
Installation
SKILL.md
Terminal-Bench Integration
This directory contains the mux agent adapter for Terminal-Bench 2.0, using Harbor as the evaluation harness.
Quick Start
When user asks to run a tbench, generally assume they mean in CI via workflow_dispatch.
# Run full benchmark suite
make benchmark-terminal
# Run specific tasks
make benchmark-terminal TB_TASK_NAMES="hello-world chess-best-move"
# Run with specific model
make benchmark-terminal TB_ARGS="--agent-kwarg model_name=anthropic/claude-opus-4-5"
# Run on Daytona cloud (high parallelism)
Related skills
More from coder/mux
deep-review
Sub-agent powered code reviews spanning correctness, tests, consistency, and fit
89react-effects
Guidelines for when to use (and avoid) useEffect in React components
63generate-readme-screenshots
>-
31dev-server-sandbox
Run multiple isolated mux dev-server instances (temp MUX_ROOT + free ports)
30mobile-dev-server-sandbox
Connects Mux mobile (Expo web/native) to an isolated dev-server sandbox with deterministic port setup, backend pairing, and Chrome MCP interaction. Use when implementing or validating mobile features against a sandboxed Mux backend.
29dev-desktop-sandbox
Run isolated mux desktop (Electron) instances (temp MUX_ROOT + free ports)
28