Eval Audit

Inspect an LLM eval pipeline and produce a prioritized list of problems with concrete next steps.

Overview

Gather eval artifacts: traces, evaluator configs, judge prompts, labeled data, metrics dashboards
Run diagnostic checks across six areas
Produce a findings report ordered by impact, with each finding linking to a fix

Prerequisites

Access to eval artifacts (traces, evaluator configs, judge prompts, labeled data) via an observability MCP server or local files. If none exist, skip to "No Eval Infrastructure."

Connecting to Eval Infrastructure

Check whether the user has an observability MCP server connected (Phoenix, Braintrust, LangSmith, Truesight or similar). If available, use it to pull traces, evaluator definitions, and experiment results. If not, ask for local files: CSVs, JSON trace exports, notebooks, or evaluation scripts.

eval-audit

Eval Audit

Overview

Prerequisites

Connecting to Eval Infrastructure

Diagnostic Checks