Judge Verification

Overview

An independent LLM evaluation layer that verifies whether a task was genuinely completed. The judge reviews the original task goal, the sequence of actions taken, the final state of relevant artifacts, and the claimed completion evidence — then produces a PASS/FAIL verdict with a confidence score and actionable reasoning.

This skill is distinct from verification-before-completion: that skill runs checklist gates within the same agent context. Judge-verification uses a fresh, independent perspective with no access to the executing agent's prior reasoning, catching hallucinated success claims.

When to Use

Skill({ skill: 'judge-verification' });