OpenJudge Skill

Build evaluation pipelines for LLM applications using the openjudge library.

When to Use This Skill

User wants to evaluate LLM output quality (correctness, relevance, hallucination, etc.)
User wants to compare two or more models and rank them
User wants to design a scoring rubric and automate evaluation
User wants to analyze evaluation results statistically
User wants to build a reward model or quality filter

Topic	File	Read when…
Grader selection & configuration	`graders.md`	User needs to pick or configure an evaluator
Batch evaluation pipeline	`pipeline.md`	User needs to run evaluation over a dataset
Auto-generate graders from data	`generator.md`	No rubric yet; generate from labeled examples