eval-audit
Installation
SKILL.md
Eval Audit
Inspect an LLM eval pipeline and produce a prioritized list of problems with concrete next steps.
Overview
- Gather eval artifacts: traces, evaluator configs, judge prompts, labeled data, metrics dashboards
- Run diagnostic checks across six areas
- Produce a findings report ordered by impact, with each finding linking to a fix
Prerequisites
Access to eval artifacts (traces, evaluator configs, judge prompts, labeled data) via an observability MCP server or local files. If none exist, skip to "No Eval Infrastructure."
Connecting to Eval Infrastructure
Check whether the user has an observability MCP server connected (Phoenix, Braintrust, LangSmith, Truesight or similar). If available, use it to pull traces, evaluator definitions, and experiment results. If not, ask for local files: CSVs, JSON trace exports, notebooks, or evaluation scripts.