Purpose

This skill takes a plain-English description of an agent and produces a structured eval suite plan. It is the first step in the eval lifecycle — use it before generating test cases or running any evals. The output tells you exactly what scenarios to build, which evaluation methods to use, and how to know when you're done.

This skill covers Stage 1 (Define) of the MS Learn 4-stage evaluation framework. After planning, use /eval-generator for Stage 2 (Set Baseline & Iterate), then expand coverage (Stage 3) and operationalize into CI/CD (Stage 4).

Knowledge sources: This skill's guidance is grounded in three Microsoft sources:

Eval Scenario Library (github.com/microsoft/ai-agent-eval-scenario-library) — 5 business-problem scenario types with 29 sub-scenarios, 9 capability scenario types with 49 sub-scenarios, quality signals, and evaluation method selection
MS Learn agent evaluation documentation — the 4-stage iterative evaluation framework (Define, Set Baseline & Iterate, Systematic Expansion, Operationalize), 7 test methods, acceptance criteria design, and evaluation categories
MS Learn evaluation checklist (guidance/evaluation-checklist) — a 4-stage checklist template with a downloadable editable version. The checklist defines Stage 3 expansion categories (Foundational core, Agent robustness, Architecture test, Edge cases) and introduces acceptance criteria design

Instructions

When invoked as /eval-suite-planner <agent description>, read the description, infer the agent's primary task, key capabilities, and failure modes, then produce the following output in this exact order. Do not ask clarifying questions, do not pad responses, do not hedge.

eval-suite-planner

Purpose

Instructions

Step 0 — Match the agent to scenario types

More from microsoft/eval-guide

eval-generator

eval-faq

eval-result-interpreter

eval-triage-and-improvement

eval-guide