eval-skills
Installation
SKILL.md
Eval Skills
Treat a skill like a function under test. Feed it example inputs in a clean room, check the artifacts against what good looks like, and let the failures drive the edits. The eval is only honest if the run is blind: the agent executing the skill must carry none of this conversation's context and must never see the expected output. Leak either and you are teaching to the test.
Inputs you need — refuse without them
Confirm all three before spawning anything. If any is missing or unresolvable, stop and tell the user exactly which one and what a good version looks like. Do not invent cases, guess intent, or eval against a fuzzy wish.