ref-hallucination-arena
Reference Hallucination Arena Skill
Evaluate how accurately LLMs recommend real academic references using the
OpenJudge RefArenaPipeline:
- Load queries — from JSON/JSONL dataset
- Collect responses — BibTeX-formatted references from target models
- Extract references — parse BibTeX entries from model output
- Verify references — cross-check against Crossref / PubMed / arXiv / DBLP
- Score & rank — compute verification rate, per-field accuracy, discipline breakdown
- Generate report — Markdown report + visualization charts
Prerequisites
# Install OpenJudge
pip install py-openjudge
More from agentscope-ai/openjudge
paper-review
>
11find-skills-combo
Discover and recommend **combinations** of agent skills to complete complex, multi-faceted tasks. Provides two recommendation strategies — **Maximum Quality** (best skill per subtask) and **Minimum Dependencies** (fewest installs). Use this skill whenever the user wants to find skills, asks "how do I do X", "find a skill for X", or describes a task that likely requires multiple capabilities working together. Also use when the user mentions composing workflows, building pipelines, or needs help across several domains at once — even if they only say "find me a skill". This skill supersedes simple single-skill search by decomposing the task into subtasks and assembling an optimal skill portfolio.
11auto-arena
>
10bib-verify
>
10claude-authenticity
>
10openjudge
>
2