single-cell-clustering-and-batch-correction-with-omicverse

Installation
SKILL.md

Single-cell clustering and batch correction with omicverse

Overview

This skill distills the single-cell tutorials t_cluster.ipynb and t_single_batch.ipynb. Use it when a user wants to preprocess an AnnData object, explore clustering alternatives (Leiden, Louvain, scICE, GMM, topic/cNMF models), and evaluate or harmonise batches with omicverse utilities.

Instructions

  1. Import libraries and set plotting defaults
    • Load omicverse as ov, scanpy as sc, and plotting helpers (scvelo as scv when using dentate gyrus demo data).
    • Apply ov.plot_set() or ov.utils.ov_plot_set() so figures adopt omicverse styling before embedding plots.
  2. Load data and annotate batches
    • For demo clustering, fetch scv.datasets.dentategyrus(); for integration, read provided .h5ad files via ov.read() and set adata.obs['batch'] identifiers for each cohort.
    • Confirm inputs are sparse numeric matrices; convert with adata.X = adata.X.astype(np.int64) when required for QC steps.
  3. Run quality control
    • Execute ov.pp.qc(adata, tresh={'mito_perc': 0.2, 'nUMIs': 500, 'detected_genes': 250}, batch_key='batch') to drop low-quality cells and inspect summary statistics per batch.
    • Save intermediate filtered objects (adata.write_h5ad(...)) so users can resume from clean checkpoints.
  4. Preprocess and select features
    • Call ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=3000, batch_key=None) to normalise, log-transform, and flag highly variable genes; assign adata.raw = adata and subset to adata.var.highly_variable_features for downstream modelling.
    • Scale expression (ov.pp.scale(adata)) and compute PCA scores with ov.pp.pca(adata, layer='scaled', n_pcs=50). Encourage reviewing variance explained via ov.utils.plot_pca_variance_ratio(adata).
  5. Construct neighbourhood graph and baseline clustering
Related skills
Installs
32
GitHub Stars
985
First Seen
Jan 26, 2026