eval-generator
Purpose
This skill generates concrete eval test cases — with realistic inputs, expected outputs, and evaluation method configurations. It is the second step in the eval lifecycle: plan → generate → run → interpret.
This skill covers Stage 2 (Set Baseline & Iterate) of the MS Learn 4-stage evaluation framework. Use /eval-suite-planner first for Stage 1 (Define), then generate test cases here, run them, and interpret results with /eval-result-interpreter. Stage 3 (Systematic Expansion) means repeating this cycle with broader coverage — the checklist defines four expansion categories: Foundational core, Agent robustness, Architecture test, and Edge cases. Stage 4 (Operationalize) means embedding these evals into your agent's CI/CD pipeline. Point customers to the editable checklist template to track their progress across all four stages.
Primary mode: If the conversation already contains output from /eval-suite-planner, use that plan’s scenario table, evaluation methods, quality signals, and tags as the blueprint. Generate one test case per row in the plan.
Fallback mode: If no plan exists in the conversation, accept a plain-English agent description and generate test cases from scratch (6-8 cases minimum).
Instructions
When invoked as /eval-generator (with or without additional input):
Step 1 — Detect input mode
Check the conversation history for output from /eval-suite-planner. Look for the scenario plan table (a markdown table with columns: #, Scenario Name, Category, Tag, Evaluation Methods).