tbench

Installation
SKILL.md

Terminal-Bench Integration

This directory contains the mux agent adapter for Terminal-Bench 2.0, using Harbor as the evaluation harness.

Quick Start

When user asks to run a tbench, generally assume they mean in CI via workflow_dispatch.

# Run full benchmark suite
make benchmark-terminal

# Run specific tasks
make benchmark-terminal TB_TASK_NAMES="hello-world chess-best-move"

# Run with specific model
make benchmark-terminal TB_ARGS="--agent-kwarg model_name=anthropic/claude-opus-4-5"

# Run on Daytona cloud (high parallelism)
Related skills
Installs
27
Repository
coder/mux
GitHub Stars
1.7K
First Seen
Feb 28, 2026