sc-evaluate

Installation
SKILL.md

LLM Evaluation Skill

Run LLM pipeline evaluation against gold standard datasets using oracle LLM-as-judge scoring. Measures output quality across weighted dimensions, identifies weak steps, and suggests prompt improvements.

Quick Start

# Full evaluation (all test cases, all steps)
/sc:evaluate

# Quick spot check
/sc:evaluate --cases=case_1,case_2 --steps=1,2,3

# Re-evaluate existing results without re-running pipeline
/sc:evaluate --skip-pipeline

# Generate outputs only (no evaluation)
/sc:evaluate --skip-eval
Related skills
Installs
5
GitHub Stars
17
First Seen
Mar 10, 2026