Skillforge — write and optimize Claude Code skills the right way

For full Anthropic-authoritative guidance (frontmatter fields, 500-line budget, dynamic context injection, testing framework, anti-patterns, the 9 skill types, the 5 workflow patterns), load: references/anthropic-skill-best-practices.md.

Two modes

Mode	Use for	Output
forge (default)	A new skill that doesn't exist yet	A new skill dir, drafted per the process below
optimize	An existing skill that works but should be better	A V2 of that skill — measurably better at its outcome

forge follows the process + checklist in the rest of this file.

optimize <skill> runs a metric-driven loop: define the outcome + metric → set gates (incl. a no-cheating audit) → quality audit → research the domain for outcome-improving techniques → synthesize V2 with a changelog → verify V2 beats V1 on a held-out benchmark, not a single example, discarding any candidate that fails a gate. "Optimize," not "tidy": a cleanup that doesn't move the outcome is not a V2, and a score that jumped by gaming the rubric is a regression. The loop is self-contained; for a heavy run (many hypotheses, parallel experiments, hours) you can optionally escalate to an external optimizer if you have one (ce-optimize plugin, evo, or Microsoft's SkillOpt). Full playbook: references/optimize-mode.md.

Meta-process: iterate first, extract second

Anthropic's recommended creation flow: iterate on a single challenging task until Claude succeeds, then extract the winning approach into a skill. Don't write skills for hypothetical future needs. Solve the real problem in conversation, find the prompt + context shape that works, freeze it.

skillforge

Skillforge — write and optimize Claude Code skills the right way

Two modes

Meta-process: iterate first, extract second