eval-audit

Installation
SKILL.md

Eval Audit

Inspect an LLM eval pipeline and produce a prioritized list of problems with concrete next steps.

Overview

  1. Gather eval artifacts: traces, evaluator configs, judge prompts, labeled data, metrics dashboards
  2. Run diagnostic checks across six areas
  3. Produce a findings report ordered by impact, with each finding linking to a fix

Prerequisites

Access to eval artifacts (traces, evaluator configs, judge prompts, labeled data) via an observability MCP server or local files. If none exist, skip to "No Eval Infrastructure."

Connecting to Eval Infrastructure

Check whether the user has an observability MCP server connected (Phoenix, Braintrust, LangSmith, Truesight or similar). If available, use it to pull traces, evaluator definitions, and experiment results. If not, ask for local files: CSVs, JSON trace exports, notebooks, or evaluation scripts.

Diagnostic Checks

Installs
387
GitHub Stars
1.4K
First Seen
Mar 3, 2026
eval-audit — hamelsmu/evals-skills