openjudge

Installation
SKILL.md

OpenJudge Skill

Build evaluation pipelines for LLM applications using the openjudge library.

When to Use This Skill

  • User wants to evaluate LLM output quality (correctness, relevance, hallucination, etc.)
  • User wants to compare two or more models and rank them
  • User wants to design a scoring rubric and automate evaluation
  • User wants to analyze evaluation results statistically
  • User wants to build a reward model or quality filter

Sub-documents — Read When Relevant

Topic File Read when…
Grader selection & configuration graders.md User needs to pick or configure an evaluator
Batch evaluation pipeline pipeline.md User needs to run evaluation over a dataset
Auto-generate graders from data generator.md No rubric yet; generate from labeled examples
Related skills
Installs
2
GitHub Stars
202
First Seen
Mar 21, 2026