eval-suite-planner

Installation
SKILL.md

Purpose

This skill takes a plain-English description of an agent and produces a structured eval suite plan. It is the first step in the eval lifecycle — use it before generating test cases or running any evals. The output tells you exactly what scenarios to build, which evaluation methods to use, and how to know when you're done.

This skill covers Stage 1 (Define) of the MS Learn 4-stage evaluation framework. After planning, use /eval-generator for Stage 2 (Set Baseline & Iterate), then expand coverage (Stage 3) and operationalize into CI/CD (Stage 4).

Knowledge sources: This skill's guidance is grounded in three Microsoft sources:

  • Eval Scenario Library (github.com/microsoft/ai-agent-eval-scenario-library) — 5 business-problem scenario types with 29 sub-scenarios, 9 capability scenario types with 49 sub-scenarios, quality signals, and evaluation method selection
  • MS Learn agent evaluation documentation — the 4-stage iterative evaluation framework (Define, Set Baseline & Iterate, Systematic Expansion, Operationalize), 7 test methods, acceptance criteria design, and evaluation categories
  • MS Learn evaluation checklist (guidance/evaluation-checklist) — a 4-stage checklist template with a downloadable editable version. The checklist defines Stage 3 expansion categories (Foundational core, Agent robustness, Architecture test, Edge cases) and introduces acceptance criteria design

Instructions

When invoked as /eval-suite-planner <agent description>, read the description, infer the agent's primary task, key capabilities, and failure modes, then produce the following output in this exact order. Do not ask clarifying questions, do not pad responses, do not hedge.


Step 0 — Match the agent to scenario types

Related skills

More from microsoft/eval-guide

Installs
28
GitHub Stars
6
First Seen
Apr 9, 2026