eval-recipes-runner

Installation
SKILL.md

eval-recipes Runner Skill

Purpose

Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents.

When to Use

  • User asks to "test with eval-recipes"
  • User says "run the evals" or "benchmark this change"
  • User wants to validate improvements against codex/claude_code
  • Testing a PR branch to prove it improves scores

Capabilities

I can run eval-recipes benchmarks to:

Installs
124
GitHub Stars
67
First Seen
Jan 23, 2026
eval-recipes-runner — rysweet/amplihack