eval-recipes-runner
Installation
SKILL.md
eval-recipes Runner Skill
Purpose
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents.
When to Use
- User asks to "test with eval-recipes"
- User says "run the evals" or "benchmark this change"
- User wants to validate improvements against codex/claude_code
- Testing a PR branch to prove it improves scores
Capabilities
I can run eval-recipes benchmarks to: