eval-triage-and-improvement
Eval Triage & Improvement
You help users interpret their agent evaluation results and find actionable next steps to improve. Follow the hybrid workflow: gather eval results first, then generate a structured triage report with root causes, owners, and recommended fixes.
This skill serves Stages 2-4 of the MS Learn 4-stage evaluation framework — the iterative loop of running evals, diagnosing failures, applying fixes, and re-running. In Stage 4 (Operationalize), this skill helps triage regressions caught by CI/CD eval runs after agent updates. Use the evaluation checklist template to track your position in the lifecycle.
When to use this skill vs. eval-result-interpreter
These two skills share the same triage framework but serve different modes of work:
| Use eval-triage-and-improvement when… | Use eval-result-interpreter when… |
|---|---|
| You want interactive guidance walking through diagnosis step by step | You have a CSV file or concrete results and want a one-shot structured report |
| You are in an ongoing improvement loop — fixing, re-running, and re-triaging | This is your first look at results — you need a verdict and top actions fast |
| You need detailed remediation help for specific quality signals (e.g., "wrong tool fires — now what?") | You want a customer-deliverable artifact (the .docx triage report) |
| You have many failures (15+) and need help prioritizing which to investigate | The eval run is relatively straightforward (<20 failures) |
| You need the playbook worked examples and deeper diagnostic walkthroughs | You need the activity map / result comparison tool recommendations inline |
If in doubt: Start with eval-result-interpreter to get the structured report, then switch to eval-triage-and-improvement if you need interactive help implementing the fixes.
More from microsoft/eval-guide
eval-generator
Generates eval test cases from an eval suite plan (output of /eval-suite-planner) or a plain-English agent description. Supports both single-response and conversation (multi-turn) evaluation modes. Outputs a Copilot Studio test set table, a CSV file for import (single-response only), and a docx report for human review.
31eval-faq
Answers AI agent evaluation methodology questions with practical, opinionated guidance grounded primarily in Microsoft's agent evaluation ecosystem (MS Learn, Eval Scenario Library, Triage & Improvement Playbook, Eval Guidance Kit) supplemented by select industry sources.
29eval-result-interpreter
Analyzes Copilot Studio evaluation CSV results using Microsoft's Triage & Improvement Playbook. Returns a SHIP / ITERATE / BLOCK verdict with root cause classification, diagnostic triage, prioritized remediation, and pattern analysis.
29eval-suite-planner
Produces a concrete eval suite plan grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance — scenario types, evaluation methods, quality signals, thresholds, and priority order — before any test cases are generated or evals are run.
28eval-guide
Eval enablement accelerator — help customers think through "what does good look like" for their AI agent, then generate a structured eval plan and test cases they can use immediately. No running agent required. Works from a description, an idea, or even a vague goal. Use when anyone mentions agent evaluation, eval planning, "what should we test", "how do we know if the agent is good", test case generation, or interpreting eval results.
17