rag-eval
Installation
SKILL.md
On-disk RAG evaluation (corpus/ + train.json)
Purpose
Guide agents through NVIDIA RAG Blueprint filesystem benchmarks: preparing corpus/ and train.json, running scripts/eval/evaluate_rag.py, tuning retrieval and generation flags for quality comparisons, interpreting RAGAS JSON outputs, and triaging failures (HTTP/stream errors, empty contexts, collection mismatch, judge API).
For latency, throughput, and load testing, use the rag-perf skill (scripts/rag-perf, docs/performance-benchmarking.md) — not this skill.
When not to use
Do not use this skill for: deploying or repairing services (use rag-blueprint); evaluating APIs without the corpus/ + train.json layout; general ML experimentation unrelated to this evaluator; production monitoring/alerting; or latency/throughput benchmarking (use rag-perf).
Prerequisites
- Repo cloned; run commands from repo root (imports and paths assume this).
- Python 3.11+ and uv; eval deps:
uv sync --project scripts/eval. - Reachable RAG server and ingestor (defaults often
localhost:8081/8082). NVIDIA_API_KEYfor RAGAS (see credential hygiene); optionalRAG_EVAL_JUDGE_MODEL.- Dataset roots passed to
--dataset-pathseach containcorpus/andtrain.json.