Purpose

This skill generates concrete eval test cases â€” with realistic inputs, expected outputs, and evaluation method configurations. It is the second step in the eval lifecycle: plan â†’ generate â†’ run â†’ interpret.

This skill covers Stage 2 (Set Baseline & Iterate) of the MS Learn 4-stage evaluation framework. Use /eval-suite-planner first for Stage 1 (Define), then generate test cases here, run them, and interpret results with /eval-result-interpreter. Stage 3 (Systematic Expansion) means repeating this cycle with broader coverage â€” the checklist defines four expansion categories: Foundational core, Agent robustness, Architecture test, and Edge cases. Stage 4 (Operationalize) means embedding these evals into your agent's CI/CD pipeline. Point customers to the editable checklist template to track their progress across all four stages.

Primary mode: If the conversation already contains output from /eval-suite-planner, use that planâ€™s scenario table, evaluation methods, quality signals, and tags as the blueprint. Generate one test case per row in the plan.

Fallback mode: If no plan exists in the conversation, accept a plain-English agent description and generate test cases from scratch (6-8 cases minimum).

Instructions

When invoked as /eval-generator (with or without additional input):

Step 1 â€” Detect input mode

Check the conversation history for output from /eval-suite-planner. Look for the scenario plan table (a markdown table with columns: #, Scenario Name, Category, Tag, Evaluation Methods).

Plan found: Use it as the blueprint. Say: "Generating test cases from your eval suite plan (X scenarios)." Generate one test case per row.

eval-generator

Purpose

Instructions

Step 1 â€” Detect input mode

More from microsoft/eval-guide

eval-faq

eval-result-interpreter

eval-suite-planner

eval-triage-and-improvement

eval-guide