Parity/coherence failure protocol

The model runs without errors but output is wrong. Scalar ops.print taps and recompile loops hide directional bugs and burn GPU time. Build a per-layer tensor-dump comparator first; every later check becomes a numpy read from disk.

Use this skill when MAX output disagrees with a PyTorch reference you can run and hook. The primary case is a custom-architecture port that serves but fails parity or coherence checks; the same protocol covers a quantized variant of a working port, a multi-GPU conversion of a working single-GPU port, and a regression after a MAX upgrade — anywhere a trusted reference exists.

Do not use this skill when:

The server crashes on load → fix config, weights, graph (import-model)
You have not finished implementing the graph → import-model Phase 2
An already-verified model needs logit-comparison tolerances tuned → that is threshold calibration, not corruption

debug-model

Parity/coherence failure protocol