ref-hallucination-arena

Installation
SKILL.md

Reference Hallucination Arena Skill

Evaluate how accurately LLMs recommend real academic references using the OpenJudge RefArenaPipeline:

  1. Load queries — from JSON/JSONL dataset
  2. Collect responses — BibTeX-formatted references from target models
  3. Extract references — parse BibTeX entries from model output
  4. Verify references — cross-check against Crossref / PubMed / arXiv / DBLP
  5. Score & rank — compute verification rate, per-field accuracy, discipline breakdown
  6. Generate report — Markdown report + visualization charts

Prerequisites

# Install OpenJudge
pip install py-openjudge
Related skills
Installs
8
GitHub Stars
602
First Seen
Mar 7, 2026