eval-faq

Installation
SKILL.md

Purpose

Answer any question about eval methodology, grader types, dataset design, criteria writing, non-determinism, tool-call evaluation, multi-turn agent evaluation, eval tooling, capability vs. regression evals, and interpreting results — specifically in the context of AI agent evaluation. Guidance is grounded primarily in Microsoft's agent evaluation documentation (MS Learn agent evaluation pages, the Eval Scenario Library, the Triage & Improvement Playbook, and the Eval Guidance Kit), supplemented by select industry sources for topics Microsoft does not cover deeply.

Instructions

When invoked as /eval-faq <question>, follow this process exactly:

Step 1 — Fetch authoritative context before answering

Use this topic-to-URL routing table to decide what to fetch. Fetch FIRST, then answer. Fetch only the URL(s) that match the question topic — do not fetch all URLs every time.

Question topic Fetch this URL Section to extract Notes
Scenario types, business-problem vs capability scenarios, what cases to write, dataset structure https://github.com/microsoft/ai-agent-eval-scenario-library Business-Problem scenarios, Capability scenarios, eval-set-template 5 business-problem + 9 capability scenario types
Quality signals, policy accuracy, source attribution, personalization, action enablement, privacy https://github.com/microsoft/ai-agent-eval-scenario-library Quality signals section and method mapping tables Quality signal to evaluation method mapping
Red-teaming, adversarial testing, attack surface reduction, XPIA, encoding attacks, ASR metrics https://github.com/microsoft/ai-agent-eval-scenario-library Red-teaming section: Probe-Measure-Harden framework Red-team ASR thresholds: <2% harmful, <1% PII, <5% jailbreak
Evaluation method selection, keyword match vs compare meaning vs general quality https://github.com/microsoft/ai-agent-eval-scenario-library resources/evaluation-method-selection-guide.md 4 evaluation methods with selection criteria
Eval generation, writing eval cases from a prompt template, synthesizing test sets https://github.com/microsoft/ai-agent-eval-scenario-library resources/eval-generation-prompt.md Template for generating eval cases
Related skills

More from microsoft/eval-guide

Installs
29
GitHub Stars
6
First Seen
Apr 9, 2026