nan-safe-correlation

Installation
SKILL.md

NaN-Safe Correlation Computation

Overview

Computing correlations across many features (genes, proteins, variants) when missing values are present is error-prone. The most common mistake is using bulk matrix shortcuts that silently mishandle NaN, producing incorrect correlation values. This guide covers correct per-feature pairwise computation, degenerate input filtering, and performance optimization.

Key Concepts

Pairwise vs Listwise Deletion

  • Pairwise deletion: For each feature pair, remove only samples where either value is NaN. Each feature uses the maximum available data.
  • Listwise deletion: Remove any sample with NaN in any feature. Wastes valid data and biases results if missingness is not completely random.
  • Rule: Always use pairwise deletion for per-feature correlations.

Why Bulk Matrix Shortcuts Fail

Different features have different missing value patterns across samples. Bulk methods handle this inconsistently:

Related skills

More from jaechang-hits/sciagent-skills

Installs
8
GitHub Stars
152
First Seen
Apr 28, 2026