eval-harness

Installation
SKILL.md

Eval Harness

Overview

A systematic framework for evaluating agent performance. Measures accuracy, efficiency, and reliability across defined test scenarios. Enables data-driven decisions about agent quality and improvement.

When to Use

  • Before deploying agent changes to production
  • Comparing different agent configurations
  • Identifying weaknesses in agent behavior
  • Tracking agent quality over time
  • Validating prompt improvements

Evaluation Dimensions

1. Accuracy

Does the agent produce correct outputs?

Related skills
Installs
1
GitHub Stars
6
First Seen
6 days ago