craft-autoresearch

Installation
SKILL.md

craft-autoresearch

Purpose

Run an eval-driven autonomous optimization loop on a prompt or skill.

Many prompts and skills feel "mostly fine" until the last 20-30% of failures show up. Re-reading rarely finds those gaps. You find them by running the artifact many times, scoring outputs against a rubric, mutating one thing at a time, and keeping only the changes that measurably move the score.

Unlike craft-tune (single human-driven diagnose-and-edit), autoresearch measures. Gains come from the loop, not from one clever rewrite.

Use this when

  • a prompt or skill works "sometimes" and needs to work reliably
  • measurable quality criteria exist or can be drafted (format rules, pass/fail checks, comparative quality)
  • human-driven tuning has hit a plateau
  • a skill is about to be shipped and should be benchmarked first

Do not use this when:

Related skills

More from sungjunlee/craftkit

Installs
9
First Seen
Apr 15, 2026