nan-safe-correlation
Installation
SKILL.md
NaN-Safe Correlation Computation
Overview
Computing correlations across many features (genes, proteins, variants) when missing values are present is error-prone. The most common mistake is using bulk matrix shortcuts that silently mishandle NaN, producing incorrect correlation values. This guide covers correct per-feature pairwise computation, degenerate input filtering, and performance optimization.
Key Concepts
Pairwise vs Listwise Deletion
- Pairwise deletion: For each feature pair, remove only samples where either value is NaN. Each feature uses the maximum available data.
- Listwise deletion: Remove any sample with NaN in any feature. Wastes valid data and biases results if missingness is not completely random.
- Rule: Always use pairwise deletion for per-feature correlations.
Why Bulk Matrix Shortcuts Fail
Different features have different missing value patterns across samples. Bulk methods handle this inconsistently: