Domain Evaluation Harness
Domain Evaluation Harness
The harness is the bridge between the HyperAgents evolution loop and domain-specific evaluation. It defines how to load tasks, run the agent, collect predictions, and compute fitness scores.
Harness Architecture
┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Task List │────>│ Harness │────>│ Predictions │
│ (input) │ │ (executor) │ │ (output) │
└──────────────┘ └──────┬──────┘ └──────┬───────┘
│ │
┌──────▼──────┐ ┌──────▼───────┐
│ Task Agent │ │ Reporter │
│ (modified) │ │ (scorer) │
└─────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ report.json │
More from zpankz/hyperagents
staged evaluation
Two-phase evaluation strategy from HyperAgents — run a quick staged check on small samples first, only proceed to full evaluation if the staged eval passes. Saves 90%+ compute on broken mutations. Triggers when evaluating generations, running benchmarks, or optimizing evaluation cost.
1fitness evaluation framework
Domain-agnostic fitness evaluation for evolved code generations. Defines evaluation harness interfaces, scoring contracts, and multi-domain aggregation. Triggers when evaluating code quality, running benchmarks, or scoring agent outputs.
1parent selection strategies
Evolutionary parent selection algorithms for choosing which generation to mutate next. Implements random, best, score-proportional, and novelty-aware selection. Triggers when selecting parents, managing exploration/exploitation tradeoffs, or configuring evolution strategy.
1self-referential self-improvement
Apply HyperAgents' self-referential improvement pattern to any code artifact. Triggers when Claude is asked to 'improve', 'optimize', 'evolve', or 'self-improve' code, agents, skills, or prompts. Also triggers on repeated failures as an automatic recovery strategy.
1evolutionary archive management
Manage the HyperAgents evolutionary archive — an append-only log of all code generations with fitness scores, lineage tracking, and diff storage. Triggers when working with .hyperagents/ directory, archive.jsonl files, or generation metadata.
1