craft-autoresearch
craft-autoresearch
Purpose
Run an eval-driven autonomous optimization loop on a prompt or skill.
Many prompts and skills feel "mostly fine" until the last 20-30% of failures show up. Re-reading rarely finds those gaps. You find them by running the artifact many times, scoring outputs against a rubric, mutating one thing at a time, and keeping only the changes that measurably move the score.
Unlike craft-tune (single human-driven diagnose-and-edit), autoresearch measures. Gains come from the loop, not from one clever rewrite.
Use this when
- a prompt or skill works "sometimes" and needs to work reliably
- measurable quality criteria exist or can be drafted (format rules, pass/fail checks, comparative quality)
- human-driven tuning has hit a plateau
- a skill is about to be shipped and should be benchmarked first
Do not use this when:
More from sungjunlee/craftkit
craft-scaffold
Turn a rough idea into a structured prompt or skill scaffold with explicit objective, inputs, workflow, outputs, and a concrete file plan. Use this whenever the user wants to design a new prompt or skill, scaffold a skill-like workflow, mentions "scaffold," "blueprint," "structure," or "plan" for a prompt, or arrives with a vague request that needs to be shaped before implementation — even if they don't explicitly ask to scaffold.
9craft-survey
One-shot prior-art survey — study comparable prompts, skills, or repo assets, extract recurring patterns worth adopting, flag patterns to avoid, and synthesize actionable improvements for the current artifact. Use this whenever the user wants to ground a prompt or skill in proven patterns, references older assets to learn from, asks "how do others do this," mentions "survey," "prior art," or "research," or is designing something new and wants a literature-review pass before committing — even if they don't say "survey." Distinct from craft-autoresearch (which runs an eval-driven optimization loop, not a prior-art survey).
9craft-tune
Diagnose and improve an existing prompt or skill in one pass — surface the issues that are actually blocking the goal, then apply targeted minimal-diff edits that preserve the original intent. Use this whenever the user wants to refine, sharpen, tighten, review, audit, or upgrade an existing prompt or skill, says it "feels off" or "behaves inconsistently," asks "what's wrong with this," or wants to make it better. Defaults to diagnose-and-edit; switch to diagnose-only mode when the user wants the findings before any rewrite (asks to "review only," "diagnose only," or "don't edit yet").
9craft-prompt
Craft well-structured, copy-paste-ready prompts for any LLM — Claude, GPT, Gemini, Perplexity, or any other. Use this whenever the user wants a prompt built from scratch, asks to "write/make/build a prompt," needs a session handoff prompt to carry work into a new Claude Code or Codex session, shares scattered notes that need to be shaped into a usable prompt, or requests a reusable prompt template — even if they don't explicitly say "prompt." Also triggers on Korean equivalents like "프롬프트 만들어," "프롬프트 작성," "프롬프트 빌드.
9craft-handoff
>-
8craft-critique
Critique a prompt or skill and surface ambiguity, hidden assumptions, weak structure, portability issues, and likely failure modes before any rewrite happens. Use this whenever the user asks to review, audit, critique, or diagnose a prompt or skill, mentions a prompt that "feels off" or behaves inconsistently, or is about to start a large rewrite and should stop to diagnose first — even if they don't explicitly say "critique" or "review.
7