debug-model
Installation
SKILL.md
Parity/coherence failure protocol
The model runs without errors but output is wrong. Scalar ops.print taps and
recompile loops hide directional bugs and burn GPU time. Build a per-layer
tensor-dump comparator first; every later check becomes a numpy read from disk.
Use this skill when MAX output disagrees with a PyTorch reference you can run and hook. The primary case is a custom-architecture port that serves but fails parity or coherence checks; the same protocol covers a quantized variant of a working port, a multi-GPU conversion of a working single-GPU port, and a regression after a MAX upgrade — anywhere a trusted reference exists.
Do not use this skill when:
- The server crashes on load → fix config, weights, graph (
import-model) - You have not finished implementing the graph →
import-modelPhase 2 - An already-verified model needs logit-comparison tolerances tuned → that is threshold calibration, not corruption