testing-parity
Note: Parity testing is separate from the unit-level tests that ship in
tests/. If you are integrating a new model, the model-level test suite undertests/models/is still required — follow the "#### Model-level tests" section in../model-integration/SKILL.md(generate viautils/generate_model_tests.py, no--includeflags initially, noLoraTesterMixin). Parity tests verify numerical correctness during development; the generated test suite is what CI runs.
Setup — gather before starting
Before writing any test code, gather:
- Which two implementations are being compared (e.g. research repo → diffusers, standard → modular, or research → modular). Use
AskUserQuestionwith structured choices if not already clear. - Two equivalent runnable scripts — one for each implementation, both expected to produce identical output given the same inputs. These scripts define what "parity" means concretely.
When invoked from the model-integration skill, you already have context: the reference script comes from step 2 of setup, and the diffusers script is the one you just wrote. You just need to make sure both scripts are runnable and use the same inputs/seed/params.
Test strategy
Component parity (CPU/float32) -- always run, as you build. Test each component before assembling the pipeline. This is the foundation -- if individual pieces are wrong, the pipeline can't be right. Each component in isolation, strict max_diff < 1e-3.
Test freshly converted checkpoints and saved checkpoints.
- Fresh: convert from checkpoint weights, compare against reference (catches conversion bugs)
- Saved: load from saved model on disk, compare against reference (catches stale saves)