Staged Evaluation

Installation
SKILL.md

Staged Evaluation

A key optimization from HyperAgents: don't waste compute evaluating obviously broken mutations. Run a cheap quick check first, and only invest in full evaluation for promising candidates.

The Problem

Full evaluation is expensive:

  • Running a full test suite takes minutes
  • LLM-as-judge evaluations cost tokens
  • Benchmark suites can take hours
  • Most mutations (especially early ones) produce broken or worse code

The Solution: Two-Phase Evaluation

Phase 1: Staged Evaluation (Quick Check)

Installs
First Seen
Staged Evaluation — zpankz/hyperagents