eval-result-interpreter
Purpose
This skill takes eval results — a Copilot Studio evaluation CSV file, a pasted summary, or plain-English description of results — and produces a structured triage report. It is the final step in the eval lifecycle: plan → generate → run → interpret. The output tells you whether to ship, what broke, why it broke, and what to fix first.
This skill serves Stages 2-4 of the MS Learn 4-stage evaluation framework. In Stage 2 (Set Baseline & Iterate), it interprets your first eval results and guides fixes. In Stage 3 (Systematic Expansion), it identifies coverage gaps worth expanding into. In Stage 4 (Operationalize), it triages regression failures after agent updates. Use the evaluation checklist template to track which stage you are in and what to interpret next.
Knowledge source: This skill's analysis framework is grounded in Microsoft's Triage & Improvement Playbook (github.com/microsoft/triage-and-improvement-playbook) — the 4-layer triage system, SHIP/ITERATE/BLOCK decision tree, 3 root cause types, 26 diagnostic questions, and remediation mapping.
When to use this skill vs. eval-triage-and-improvement
These two skills share the same triage framework but serve different modes of work:
| Use eval-result-interpreter when… | Use eval-triage-and-improvement when… |
|---|---|
| You have a CSV file or concrete results and want a one-shot structured report | You want interactive guidance walking through diagnosis step by step |
| This is your first look at results — you need a verdict and top actions fast | You are in an ongoing improvement loop — fixing, re-running, and re-triaging |
| You want a customer-deliverable artifact (the .docx triage report) | You need detailed remediation help for specific quality signals (e.g., "wrong tool fires — now what?") |
| The eval run is relatively straightforward (<20 failures) | You have many failures (15+) and need help prioritizing which to investigate |
| You need the activity map / result comparison tool recommendations inline | You need the playbook worked examples and deeper diagnostic walkthroughs |