Autoresearch: Autonomous Experiment Loop

An autonomous optimization loop where Claude edits code, runs a benchmark, measures a metric, and keeps improvements or reverts — repeating forever until stopped.

Core Concept

The loop is simple: edit → commit → run → measure → keep or discard → repeat.

Primary metric is king. Lower (or higher, depending on direction) is better. Improved → keep the commit. Equal or worse → git revert.
State survives context resets via autoresearch.jsonl (append-only log) and autoresearch.md (living session document).
Domain-agnostic. Works for any measurable target: test speed, bundle size, LLM training loss, Lighthouse scores, build times, etc.
Be careful not to overfit to the benchmarks and do not cheat on the benchmarks. Optimize the real workload, not the measurement harness.

Setup Phase

When the user triggers autoresearch, gather the following (ask if not provided):