Harmony Batch Correction

Overview

Harmony is a fast, scalable algorithm for batch integration in single-cell data. It takes a PCA embedding (cells × PCs) as input and returns a corrected embedding from which batch effects have been regressed out via iterative soft-clustering and per-cluster linear regression. The corrected embedding is then used to compute neighbors, UMAP, and downstream clustering — the raw count matrix is never modified. Harmony works for single-cell RNA-seq, ATAC-seq, and other omics modalities where a PCA-like embedding is available.

When to Use

Integrating scRNA-seq datasets from different samples, donors, sequencing runs, or experimental batches that should contain the same cell types
Removing technical variation (library preparation protocol, 10x chemistry version, sequencing depth, sequencing platform) while preserving biological differences between cell types and conditions
Performing fast, scalable batch correction on datasets with millions of cells where deep generative model training would be prohibitively slow
Correcting for multiple confounding variables simultaneously (batch, donor, sequencing platform, tissue processing protocol)
Preparing a corrected embedding for UMAP visualization, Leiden clustering, or label transfer without modifying the gene expression count matrix
Use scVI/scvi-tools instead when you need probabilistic batch correction with a variational autoencoder (deep learning), differential expression with uncertainty estimates, or multi-modal integration (RNA + protein)
Use BBKNN instead when you want graph-based integration that avoids constructing a corrected embedding altogether and directly builds a cross-batch nearest-neighbor graph
Use Seurat Integration / CCA (R) instead when you are already in a Seurat workflow and prefer anchor-based integration methods

harmony-batch-correction

Harmony Batch Correction

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability