Fitness Evaluation Framework

Installation
SKILL.md

Fitness Evaluation Framework

This skill implements HyperAgents' domain-agnostic evaluation pattern — a pluggable harness system that scores any code generation against configurable fitness criteria.

Evaluation Harness Interface

Every domain evaluation must implement three operations:

1. Harness (Run)

Execute the agent on a set of tasks and collect predictions.

Interface:

harness(task_list, agent_path, output_dir, num_samples, num_workers) -> predictions

Output: predictions.csv with columns question_id, prediction

2. Report (Score)

Related skills

More from zpankz/hyperagents

Installs
First Seen