tbench

Installation
SKILL.md

Terminal-Bench Integration

This directory contains the mux agent adapter for Terminal-Bench 2.0, using Harbor as the evaluation harness.

Quick Start

When user asks to run a tbench, generally assume they mean in CI via workflow_dispatch.

# Run full benchmark suite
make benchmark-terminal

# Run specific tasks
make benchmark-terminal TB_TASK_NAMES="hello-world chess-best-move"

# Run with specific model and xhigh thinking
MUX_RUN_ARGS="--thinking xhigh" make benchmark-terminal TB_ARGS="--agent-kwarg model_name=anthropic/claude-opus-4-7"
Installs
28
Repository
coder/mux
GitHub Stars
1.9K
First Seen
Feb 28, 2026
tbench — coder/mux