openjudge
Installation
SKILL.md
OpenJudge Skill
Build evaluation pipelines for LLM applications using the openjudge library.
When to Use This Skill
- User wants to evaluate LLM output quality (correctness, relevance, hallucination, etc.)
- User wants to compare two or more models and rank them
- User wants to design a scoring rubric and automate evaluation
- User wants to analyze evaluation results statistically
- User wants to build a reward model or quality filter
Sub-documents — Read When Relevant
| Topic | File | Read when… |
|---|---|---|
| Grader selection & configuration | graders.md |
User needs to pick or configure an evaluator |
| Batch evaluation pipeline | pipeline.md |
User needs to run evaluation over a dataset |
| Auto-generate graders from data | generator.md |
No rubric yet; generate from labeled examples |
Related skills
More from modelscope/agentjet
train-complex-blackbox
Create a trainable agent loop or agent workflow with AgentJet
2write-swarm-client
Create a trainable agent loop or agent workflow with AgentJet
2write-passive-swarm-client
Create a passive swarm client that waits for user input instead of iterating through a dataset by itself.
2map-verl-config
map verl config to agentjet config
1monitor-with-tmux
Monitor training progress by reading tmux content with exponential backoff intervals (30s, 1min, 2min, 4min, 8min, 16min), analyze logs when anomalies occur, and provide fix suggestions
1install-agentjet-client
Install agentjet swarm server with uv package manager
1