advanced-evaluation

Installation
SKILL.md

Advanced Evaluation

LLM-as-a-Judge techniques for evaluating AI outputs. Not a single technique but a family of approaches - choosing the right one and mitigating biases is the core competency.

When to Activate

  • Building automated evaluation pipelines for LLM outputs
  • Comparing multiple model responses to select the best one
  • Establishing consistent quality standards
  • Debugging inconsistent evaluation results
  • Designing A/B tests for prompt or model changes
  • Creating rubrics for human or automated evaluation

Core Concepts

Evaluation Taxonomy

Direct Scoring: Single LLM rates one response on a defined scale.

Related skills
Installs
104
GitHub Stars
22
First Seen
Jan 20, 2026