evaluate-skill
Installation
SKILL.md
Evaluate Skill
Orchestrate a cross-tier evaluation of an AI skill to determine its clarity and robustness.
Procedure
-
Load inputs
- Read the skill file at
{{ skill-path }} - Read the test cases file at
{{ test-cases-path }} - Validate that test cases is a JSON array of objects with
inputandexpectedOutcomefields
- Read the skill file at
-
Set up evaluation matrix
- Model tiers to test:
opus,sonnet,haiku - For each tier, for each test case: plan one blind test run
- Model tiers to test: