skill-eval-improve
Skill eval & improve
Improve skills measurably: baseline → measure → bounded edit → re-validate. Combine local tooling, Codex plugin-eval (when installed), and research-backed loops (SkillOpt).
When to use
- Skill triggers wrong or never loads (description routing)
- Bloated
SKILL.md, high token cost, weak outcomes - After adding a new procedure—need regression checks
- Porting patterns from product MCP / plugin-eval research into Guild skills
When not to use
- Bulk repo validation — e.g. “validate every skill in this repo” →
pnpm run validateonly (skill-spec-review for audit); do not start benchmark or SkillOpt loops. - Automated SkillOpt / cluster training — Guild documents a manual bounded-edit loop; no overnight optimizer pipeline.
- Creating a new skill — use create-skill first; eval-improve applies after a skill exists.
Cursor scope (optional): activate when editing under skills/** or scripts/validate-skills.mjs.
More from arenukvern/skill_steward
adr-records
Writes and maintains ADRs (MADR, Nygard, Y-Statement) and runs decision checkpoints before/during work—trigger matrix, option briefs, proposed ADRs. Use when creating or updating ADRs, facing a design fork, trade-off, boundary change, or when the user asks for key design decisions before implementing.
1faq-driven-docs
Creates and maintains DESIGN_FAQ (why) and DX_FAQ (how) documentation in FAQ-driven development style. Use when writing docs, rules, prompts, updating FAQs after code changes, or bootstrapping faq_usage rules and Cursor commands.
1create-skill
Scaffold a new Agent Skill in this marketplace repo with valid SKILL.md, directory layout, and registry entries. Use when adding a skill, creating SKILL.md, or contributing to skill_steward.
1skill-source-citations
Requires durable citations when authoring or researching Agent Skills—maintain references/sources.md per skill, link external research, and record provenance in PRs. Use when creating skills, updating SKILL.md, doing web research for skills, or auditing missing sources.
1skill-spec-review
Audit SKILL.md and skill directories for Agent Skills spec, Cursor extensions, and npx skills compatibility. Use when reviewing a skill, validating frontmatter, or checking marketplace readiness.
1concept-doc-store
Bootstraps and maintains a vectorless, layered documentation store for concepts, architecture, and decisions—without duplicating how code works. Use when organizing repo docs, writing ADRs, north-star charters, agent playbooks, or product-style doc lattices.
1