experiment-audit

Installation
SKILL.md

Experiment Audit: Cross-Model Integrity Verification

Audit experiment integrity for: $ARGUMENTS

Why This Exists

LLM agents can produce fraudulent experimental results through:

  1. Fake ground truth — creating synthetic "reference" from model outputs, then reporting high agreement as performance
  2. Score normalization — dividing metrics by the model's own max to get 0.99+
  3. Phantom results — claiming numbers from files that don't exist or functions never called
  4. Insufficient scope — reporting 2-scene pilots as "comprehensive evaluation"

These are NOT intentional deception — they are failure modes of optimizing agents that lack integrity constraints. This skill adds that constraint.

Core Principle

The executor (Claude) collects file paths. The reviewer (GPT-5.4) reads code and judges integrity. The executor does NOT participate in integrity judgment.

This follows shared-references/reviewer-independence.md and shared-references/experiment-integrity.md.

Related skills

More from shaun-z/auto-claude-code-research-in-sleep

Installs
1
First Seen
Apr 19, 2026