Fitness Evaluation Framework

Installation

SKILL.md

Fitness Evaluation Framework

This skill implements HyperAgents' domain-agnostic evaluation pattern — a pluggable harness system that scores any code generation against configurable fitness criteria.

Evaluation Harness Interface

Every domain evaluation must implement three operations:

1. Harness (Run)

Execute the agent on a set of tasks and collect predictions.

Interface:

harness(task_list, agent_path, output_dir, num_samples, num_workers) -> predictions

Output: predictions.csv with columns question_id, prediction

2. Report (Score)

Related skills

More from zpankz/hyperagents

staged evaluation
Two-phase evaluation strategy from HyperAgents — run a quick staged check on small samples first, only proceed to full evaluation if the staged eval passes. Saves 90%+ compute on broken mutations. Triggers when evaluating generations, running benchmarks, or optimizing evaluation cost.
1
parent selection strategies
Evolutionary parent selection algorithms for choosing which generation to mutate next. Implements random, best, score-proportional, and novelty-aware selection. Triggers when selecting parents, managing exploration/exploitation tradeoffs, or configuring evolution strategy.
1
domain evaluation harness
Create and configure domain-specific evaluation harnesses for the HyperAgents evolution loop. Defines how tasks are loaded, agents are invoked, predictions are collected, and scores are computed. Triggers when setting up evaluation domains or creating custom fitness functions.
1
self-referential self-improvement
Apply HyperAgents' self-referential improvement pattern to any code artifact. Triggers when Claude is asked to 'improve', 'optimize', 'evolve', or 'self-improve' code, agents, skills, or prompts. Also triggers on repeated failures as an automatic recovery strategy.
1
evolutionary archive management
Manage the HyperAgents evolutionary archive — an append-only log of all code generations with fitness scores, lineage tracking, and diff storage. Triggers when working with .hyperagents/ directory, archive.jsonl files, or generation metadata.
1

Installs

–

Repository

zpankz/hyperagents

First Seen

–

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

Fitness Evaluation Framework

Fitness Evaluation Framework

Evaluation Harness Interface

1. Harness (Run)

2. Report (Score)

More from zpankz/hyperagents

staged evaluation

parent selection strategies

domain evaluation harness

self-referential self-improvement

evolutionary archive management