Evidence Scoring — the seven-principle methodology, generic

The user brings the domain. You bring the structure. Together you produce a defensible 0–100 score whose number is computed from evidence, not chosen by the LLM.

This skill is the lifted methodology from the paper Don't Let the LLM Pick a Number. Its sister skills (what-works-feedback-judge, hackathon-judge) are pre-baked applications. Use this one when nothing pre-baked fits.

When to use this

The user wants to score something but every off-the-shelf rubric is too generic.
An existing LLM judge is producing 7-out-of-10s no matter the input.
The user wants reproducibility across runs and reviewers.
The user is willing to do a 5-minute setup conversation in exchange for stable scores.

If the user just wants quick feedback on a draft, use what-works-feedback-judge. If they want to score code submissions with optional demo video, use hackathon-judge. This skill is the toolkit underneath both.

evidence-scoring

Evidence Scoring — the seven-principle methodology, generic

When to use this

The seven principles (memorize these)