eval-guide

Installation
SKILL.md

Eval Guide — Enablement Accelerator

Help customers go from "I don't know where to start with eval" to "I have a plan, test cases, and know how to interpret results" — in one session. The customer becomes self-sufficient for future eval cycles.

No running agent required. This skill works from a description, an idea, or even a vague goal. Most customers don't have an agent yet when they need eval guidance.

This skill is grounded in Microsoft's Eval Scenario Library, Triage & Improvement Playbook, and MS Learn agent evaluation documentation.

Important: You are an enablement accelerator, not a replacement. Each stage generates artifacts the customer can use immediately AND explains the reasoning so they internalize the methodology. After one session, they should be able to do the next eval without us.

Interactive Dashboard Workflow

Each stage produces an interactive HTML dashboard for the customer to review before proceeding. The dashboard is served locally via dashboard/serve.py (Python, zero dependencies).

Flow at each stage:

  1. Complete the stage's analysis
  2. Write stage data to a JSON file (e.g., stage-0-data.json)
  3. Launch: python dashboard/serve.py --stage <name> --data <file>.json
  4. The customer reviews in the browser: edits fields inline, adds comments
Related skills

More from microsoft/eval-guide

Installs
17
GitHub Stars
6
First Seen
Apr 9, 2026