rag-eval

Installation
SKILL.md

On-disk RAG evaluation (corpus/ + train.json)

Purpose

Guide agents through NVIDIA RAG Blueprint filesystem benchmarks: preparing corpus/ and train.json, running scripts/eval/evaluate_rag.py, tuning retrieval and generation flags for quality comparisons, interpreting RAGAS JSON outputs, and triaging failures (HTTP/stream errors, empty contexts, collection mismatch, judge API).

For latency, throughput, and load testing, use the rag-perf skill (scripts/rag-perf, docs/performance-benchmarking.md) — not this skill.

When not to use

Do not use this skill for: deploying or repairing services (use rag-blueprint); evaluating APIs without the corpus/ + train.json layout; general ML experimentation unrelated to this evaluator; production monitoring/alerting; or latency/throughput benchmarking (use rag-perf).

Prerequisites

  • Repo cloned; run commands from repo root (imports and paths assume this).
  • Python 3.11+ and uv; eval deps: uv sync --project scripts/eval.
  • Reachable RAG server and ingestor (defaults often localhost:8081 / 8082).
  • NVIDIA_API_KEY for RAGAS (see credential hygiene); optional RAG_EVAL_JUDGE_MODEL.
  • Dataset roots passed to --dataset-paths each contain corpus/ and train.json.
Installs
152
Repository
nvidia/skills
GitHub Stars
1.0K
First Seen
7 days ago
rag-eval — nvidia/skills