agent-eval-harness

Installation
SKILL.md

Agent Eval Harness

Purpose

CLI tool for capturing trajectories from headless CLI agents, optimized for TypeScript/JavaScript projects using Bun.

The harness captures. You score.

Harness Provides You Provide
Prompt execution via headless adapters Scoring logic (Braintrust, custom scripts)
Full trajectory capture (thoughts, tools, plans) Pass/fail determination via graders
Structured JSONL output LLM-as-judge prompts
Reproducible execution environment CI integration, golden file comparison
Installs
35
GitHub Stars
36
First Seen
Feb 28, 2026
agent-eval-harness — youdotcom-oss/agent-skills