eval-generator

Installation
SKILL.md

Purpose

This skill generates concrete eval test cases — with realistic inputs, expected outputs, and evaluation method configurations. It is the second step in the eval lifecycle: plan → generate → run → interpret.

This skill covers Stage 2 (Set Baseline & Iterate) of the MS Learn 4-stage evaluation framework. Use /eval-suite-planner first for Stage 1 (Define), then generate test cases here, run them, and interpret results with /eval-result-interpreter. Stage 3 (Systematic Expansion) means repeating this cycle with broader coverage — the checklist defines four expansion categories: Foundational core, Agent robustness, Architecture test, and Edge cases. Stage 4 (Operationalize) means embedding these evals into your agent's CI/CD pipeline. Point customers to the editable checklist template to track their progress across all four stages.

Primary mode: If the conversation already contains output from /eval-suite-planner, use that plan’s scenario table, evaluation methods, quality signals, and tags as the blueprint. Generate one test case per row in the plan.

Fallback mode: If no plan exists in the conversation, accept a plain-English agent description and generate test cases from scratch (6-8 cases minimum).

Instructions

When invoked as /eval-generator (with or without additional input):

Step 1 — Detect input mode

Check the conversation history for output from /eval-suite-planner. Look for the scenario plan table (a markdown table with columns: #, Scenario Name, Category, Tag, Evaluation Methods).

  • Plan found: Use it as the blueprint. Say: "Generating test cases from your eval suite plan (X scenarios)." Generate one test case per row.
Related skills

More from microsoft/eval-guide

Installs
31
GitHub Stars
6
First Seen
Apr 9, 2026