multi-version-behavior-comparator
Installation
SKILL.md
Multi-Version Behavior Comparator
Differential testing, generalized: run N versions on the same inputs, and any disagreement is a bug somewhere. You don't need an oracle — the versions are each other's oracle.
Compared to → behavior-preservation-checker (yes/no answer, two versions): this handles N versions, and focuses on characterizing divergence, not just detecting it.
What counts as "behavior"
| Observable | How to capture | When it matters |
|---|---|---|
| Return value | Direct comparison (after normalization) | Always |
| Exceptions / errors | Type + message (message is often unstable — compare type) | Always |
| Side effects (I/O, DB) | Mock + record, or capture at the boundary | If the code does I/O |
| stdout/stderr | Capture streams | If output is the interface |
| Mutation of inputs | Deep-copy inputs before call, compare after | If inputs are mutable |
| Timing | Measure, but wide tolerance — don't flag unless 10×+ different | Performance regressions only |
| Resource usage | Memory high-water, FD count | Leak hunting |
Pick which observables matter before running. Diffing everything produces noise.