bulk-rna-seq-batch-correction-with-combat

Installation
SKILL.md

Bulk RNA-seq batch correction with ComBat

Overview

Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them before downstream analysis. It follows t_bulk_combat.ipynb, w hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.

Instructions

  1. Import core libraries
    • Load omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.
    • Call ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.
  2. Load each batch separately
    • Read the prepared pickled matrices (or user-provided expression tables) with pd.read_pickle(...)/pd.read_csv(...).
    • Transpose to gene × sample before wrapping them in anndata.AnnData objects so adata.obs stores sample metadata.
    • Assign a batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa ble.
  3. Concatenate on shared genes
    • Use anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.
    • Confirm the combined adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.
Related skills
Installs
30
GitHub Stars
985
First Seen
Jan 26, 2026