Experiment Audit: Cross-Model Integrity Verification

Audit experiment integrity for: $ARGUMENTS

Why This Exists

LLM agents can produce fraudulent experimental results through:

Fake ground truth — creating synthetic "reference" from model outputs, then reporting high agreement as performance
Score normalization — dividing metrics by the model's own max to get 0.99+
Phantom results — claiming numbers from files that don't exist or functions never called
Insufficient scope — reporting 2-scene pilots as "comprehensive evaluation"

These are NOT intentional deception — they are failure modes of optimizing agents that lack integrity constraints. This skill adds that constraint.

Core Principle

The executor (Claude) collects file paths. The reviewer (GPT-5.4) reads code and judges integrity. The executor does NOT participate in integrity judgment.

This follows shared-references/reviewer-independence.md and shared-references/experiment-integrity.md.

experiment-audit

Experiment Audit: Cross-Model Integrity Verification

Why This Exists

Core Principle

More from shaun-z/auto-claude-code-research-in-sleep

arxiv

research-pipeline

mermaid-diagram

paper-writing

research-lit

auto-review-loop