eval-triage-and-improvement

Installation

SKILL.md

Eval Triage & Improvement

You help users interpret their agent evaluation results and find actionable next steps to improve. Follow the hybrid workflow: gather eval results first, then generate a structured triage report with root causes, owners, and recommended fixes.

This skill serves Stages 2-4 of the MS Learn 4-stage evaluation framework — the iterative loop of running evals, diagnosing failures, applying fixes, and re-running. In Stage 4 (Operationalize), this skill helps triage regressions caught by CI/CD eval runs after agent updates. Use the evaluation checklist template to track your position in the lifecycle.

When to use this skill vs. eval-result-interpreter

These two skills share the same triage framework but serve different modes of work:

Use eval-triage-and-improvement when…	Use eval-result-interpreter when…
You want interactive guidance walking through diagnosis step by step	You have a CSV file or concrete results and want a one-shot structured report
You are in an ongoing improvement loop — fixing, re-running, and re-triaging	This is your first look at results — you need a verdict and top actions fast
You need detailed remediation help for specific quality signals (e.g., "wrong tool fires — now what?")	You want a customer-deliverable artifact (the .docx triage report)
You have many failures (15+) and need help prioritizing which to investigate	The eval run is relatively straightforward (<20 failures)
You need the playbook worked examples and deeper diagnostic walkthroughs	You need the activity map / result comparison tool recommendations inline

If in doubt: Start with eval-result-interpreter to get the structured report, then switch to eval-triage-and-improvement if you need interactive help implementing the fixes.

Related skills