agent-eval-harness
Originally fromplaited/agent-eval-harness
Installation
SKILL.md
Agent Eval Harness
Purpose
CLI tool for capturing trajectories from headless CLI agents, optimized for TypeScript/JavaScript projects using Bun.
The harness captures. You score.
| Harness Provides | You Provide |
|---|---|
| Prompt execution via headless adapters | Scoring logic (Braintrust, custom scripts) |
| Full trajectory capture (thoughts, tools, plans) | Pass/fail determination via graders |
| Structured JSONL output | LLM-as-judge prompts |
| Reproducible execution environment | CI integration, golden file comparison |