harmony-batch-correction

Installation
SKILL.md

Harmony Batch Correction

Overview

Harmony is a fast, scalable algorithm for batch integration in single-cell data. It takes a PCA embedding (cells × PCs) as input and returns a corrected embedding from which batch effects have been regressed out via iterative soft-clustering and per-cluster linear regression. The corrected embedding is then used to compute neighbors, UMAP, and downstream clustering — the raw count matrix is never modified. Harmony works for single-cell RNA-seq, ATAC-seq, and other omics modalities where a PCA-like embedding is available.

When to Use

  • Integrating scRNA-seq datasets from different samples, donors, sequencing runs, or experimental batches that should contain the same cell types
  • Removing technical variation (library preparation protocol, 10x chemistry version, sequencing depth, sequencing platform) while preserving biological differences between cell types and conditions
  • Performing fast, scalable batch correction on datasets with millions of cells where deep generative model training would be prohibitively slow
  • Correcting for multiple confounding variables simultaneously (batch, donor, sequencing platform, tissue processing protocol)
  • Preparing a corrected embedding for UMAP visualization, Leiden clustering, or label transfer without modifying the gene expression count matrix
  • Use scVI/scvi-tools instead when you need probabilistic batch correction with a variational autoencoder (deep learning), differential expression with uncertainty estimates, or multi-modal integration (RNA + protein)
  • Use BBKNN instead when you want graph-based integration that avoids constructing a corrected embedding altogether and directly builds a cross-batch nearest-neighbor graph
  • Use Seurat Integration / CCA (R) instead when you are already in a Seurat workflow and prefer anchor-based integration methods

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026