skill-selection-evals
Installation
SKILL.md
Skill-Selection Evals
This is not an executable skill. It contains evaluation data for measuring the accuracy of skill selection (routing) decisions.
Purpose
Crucible's 49 execution evals measure quality once a skill is invoked. Selection evals measure whether the right skill gets invoked in the first place.
Eval Types
- Direct selection: Given a prompt, does the agent pick the correct skill?
- Negative selection: Given a prompt that sounds like skill X but is not, does the agent avoid the false positive?
- Context-dependent: Same verb, different context, different correct skill.
- Cascade ordering: Multi-skill tasks requiring correct invocation order.