Staged Evaluation

Installation

SKILL.md

Staged Evaluation

A key optimization from HyperAgents: don't waste compute evaluating obviously broken mutations. Run a cheap quick check first, and only invest in full evaluation for promising candidates.

The Problem

Full evaluation is expensive:

Running a full test suite takes minutes
LLM-as-judge evaluations cost tokens
Benchmark suites can take hours
Most mutations (especially early ones) produce broken or worse code

The Solution: Two-Phase Evaluation

Phase 1: Staged Evaluation (Quick Check)

Installs

–

Repository

zpankz/hyperagents

First Seen

–

Security Audits

Gen Agent Trust HubPass

Staged Evaluation — zpankz/hyperagents