empirical-prompt-tuning
Empirical Prompt Tuning
The author of a prompt cannot judge its quality. The clearer the writer thinks something is, the more likely another agent will stumble on it. The core of this skill is to have a bias-free executor actually run the instruction, evaluate it two-sidedly, and iterate. Do not stop until improvements plateau.
When to use
- Right after creating or substantially revising a skill / slash command / task prompt
- When an agent does not behave as expected and you want to attribute the cause to ambiguity on the instruction side
- When hardening high-importance instructions (frequently used skills, automation-core prompts)
When not to use:
- One-off throwaway prompts (evaluation cost does not pay off)
- When the goal is not to improve success rate but merely to reflect the writer's subjective preferences
Workflow
- Iteration 0 — description / body consistency check (static, no dispatch needed)
- Read the triggers / use cases claimed by the frontmatter
description - Read the scope the body actually covers
- Read the triggers / use cases claimed by the frontmatter
More from mizchi/skills
retrospective-codify
Pair "what failed first" with "what finally worked" and codify the should-have-known-it insight as an ast-grep rule, a skill, or a CLAUDE.md rule. Meta-skill: invoke ONLY when the user explicitly says "codify today''s lessons," "make it a skill," "drop it into lint," or asks to extract a reusable rule from a trial-and-error fix. Do NOT auto-invoke at every task completion. Spares future-you (or another agent) the same trap.
16gh-fix-ci
Debug or fix failing GitHub PR checks running in GitHub Actions. Inspects checks/logs via `gh`, drafts a fix plan, and implements only after explicit approval. Out of scope: external CI (e.g. Buildkite) — report only the details URL.
16tech-article-reproducibility
Evaluate the reproducibility of technical articles. Dispatch a subagent to simulate a first-time reader reproducing the work locally and list missing information. Use as the final check on a draft before publication.
14playwright-cli
Use when running Playwright via terminal CLI — `npx playwright test` (test runner), `codegen` (interactive recording), `screenshot` / `pdf` (one-off captures), and CI sharding. NOT for agent-driven real-time browser control (use `claude-in-chrome` MCP tools for that).
13conventional-changelog
Reference for Conventional Commits and automatic CHANGELOG generation. Compares release-please / changesets / git-cliff / towncrier and covers commit format, Keep a Changelog, and semver tag practices. Use when setting up or unifying a release flow.
12gleam-practice
Best practices for building and reviewing Gleam projects on the Erlang target, especially Wisp plus Mist web services, OTP processes, justfile workflows, testing, formatting, CI, and performance measurement.
12