Note: Parity testing is separate from the unit-level tests that ship in tests/. If you are integrating a new model, the model-level test suite under tests/models/ is still required — follow the "#### Model-level tests" section in ../model-integration/SKILL.md (generate via utils/generate_model_tests.py, no --include flags initially, no LoraTesterMixin). Parity tests verify numerical correctness during development; the generated test suite is what CI runs.

Setup — gather before starting

Before writing any test code, gather:

Which two implementations are being compared (e.g. research repo → diffusers, standard → modular, or research → modular). Use AskUserQuestion with structured choices if not already clear.
Two equivalent runnable scripts — one for each implementation, both expected to produce identical output given the same inputs. These scripts define what "parity" means concretely.

When invoked from the model-integration skill, you already have context: the reference script comes from step 2 of setup, and the diffusers script is the one you just wrote. You just need to make sure both scripts are runnable and use the same inputs/seed/params.

Test strategy

Component parity (CPU/float32) -- always run, as you build. Test each component before assembling the pipeline. This is the foundation -- if individual pieces are wrong, the pipeline can't be right. Each component in isolation, strict max_diff < 1e-3.

Test freshly converted checkpoints and saved checkpoints.

Fresh: convert from checkpoint weights, compare against reference (catches conversion bugs)
Saved: load from saved model on disk, compare against reference (catches stale saves)

testing-parity

Setup — gather before starting

Test strategy