eval-recipes-runner
Installation
SKILL.md
eval-recipes Runner Skill
Purpose
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents.
When to Use
- User asks to "test with eval-recipes"
- User says "run the evals" or "benchmark this change"
- User wants to validate improvements against codex/claude_code
- Testing a PR branch to prove it improves scores
Capabilities
I can run eval-recipes benchmarks to:
- Test specific amplihack branches
- Compare against baseline agents (codex, claude_code)
Related skills
More from rysweet/amplihack
cybersecurity-analyst
|
862lawyer-analyst
|
553pptx
Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks
391mermaid-diagram-generator
|
371psychologist-analyst
|
344economist-analyst
|
343