eval-result-interpreter

Installation

SKILL.md

Purpose

This skill takes eval results — a Copilot Studio evaluation CSV file, a pasted summary, or plain-English description of results — and produces a structured triage report. It is the final step in the eval lifecycle: plan → generate → run → interpret. The output tells you whether to ship, what broke, why it broke, and what to fix first.

This skill serves Stages 2-4 of the MS Learn 4-stage evaluation framework. In Stage 2 (Set Baseline & Iterate), it interprets your first eval results and guides fixes. In Stage 3 (Systematic Expansion), it identifies coverage gaps worth expanding into. In Stage 4 (Operationalize), it triages regression failures after agent updates. Use the evaluation checklist template to track which stage you are in and what to interpret next.

Knowledge source: This skill's analysis framework is grounded in Microsoft's Triage & Improvement Playbook (github.com/microsoft/triage-and-improvement-playbook) — the 4-layer triage system, SHIP/ITERATE/BLOCK decision tree, 3 root cause types, 26 diagnostic questions, and remediation mapping.

When to use this skill vs. eval-triage-and-improvement

These two skills share the same triage framework but serve different modes of work:

Use eval-result-interpreter when…	Use eval-triage-and-improvement when…
You have a CSV file or concrete results and want a one-shot structured report	You want interactive guidance walking through diagnosis step by step
This is your first look at results — you need a verdict and top actions fast	You are in an ongoing improvement loop — fixing, re-running, and re-triaging
You want a customer-deliverable artifact (the .docx triage report)	You need detailed remediation help for specific quality signals (e.g., "wrong tool fires — now what?")
The eval run is relatively straightforward (<20 failures)	You have many failures (15+) and need help prioritizing which to investigate
You need the activity map / result comparison tool recommendations inline	You need the playbook worked examples and deeper diagnostic walkthroughs