Eval Skills

Treat a skill like a function under test. Feed it example inputs in a clean room, check the artifacts against what good looks like, and let the failures drive the edits. The eval is only honest if the run is blind: the agent executing the skill must carry none of this conversation's context and must never see the expected output. Leak either and you are teaching to the test.

Inputs you need — refuse without them

Confirm all three before spawning anything. If any is missing or unresolvable, stop and tell the user exactly which one and what a good version looks like. Do not invent cases, guess intent, or eval against a fuzzy wish.

Installs

Repository

dzhng/skills

GitHub Stars

First Seen

1 day ago

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass