craft-autoresearch

Purpose

Run an eval-driven autonomous optimization loop on a prompt or skill.

Many prompts and skills feel "mostly fine" until the last 20-30% of failures show up. Re-reading rarely finds those gaps. You find them by running the artifact many times, scoring outputs against a rubric, mutating one thing at a time, and keeping only the changes that measurably move the score.

Unlike craft-tune (single human-driven diagnose-and-edit), autoresearch measures. Gains come from the loop, not from one clever rewrite.

Use this when

a prompt or skill works "sometimes" and needs to work reliably
measurable quality criteria exist or can be drafted (format rules, pass/fail checks, comparative quality)
human-driven tuning has hit a plateau
a skill is about to be shipped and should be benchmarked first

Do not use this when:

craft-autoresearch

craft-autoresearch

Purpose

Use this when

More from sungjunlee/craftkit

craft-scaffold

craft-survey

craft-tune

craft-prompt

craft-handoff

craft-critique