single-cell-preprocessing-with-omicverse

Installation
SKILL.md

Single-cell preprocessing with omicverse

Overview

Follow this skill when a user needs to reproduce the preprocessing workflow from the omicverse notebooks t_preprocess.ipynb, t_preprocess_cpu.ipynb, and t_preprocess_gpu.ipynb. The tutorials operate on the 10x PBMC3k dataset and cover QC filtering, normalisation, highly variable gene (HVG) detection, dimensionality reduction, and downstream embeddings.

Instructions

  1. Set up the environment
    • Import omicverse as ov and scanpy as sc, then call ov.plot_set(font_path='Arial') (or ov.ov_plot_set() in legacy notebooks) to standardise figure styling.
    • Encourage %load_ext autoreload and %autoreload 2 when iterating inside notebooks so code edits propagate without restarting the kernel.
  2. Prepare input data
    • Download the PBMC3k filtered matrix from 10x Genomics (pbmc3k_filtered_gene_bc_matrices.tar.gz) and extract it under data/filtered_gene_bc_matrices/hg19/.
    • Load the matrix via ov.io.read_10x_mtx(..., var_names='gene_symbols') and keep a writable folder like write/ for exports.
  3. Perform quality control (QC)
    • Run ov.pp.qc(adata, tresh={'mito_perc': 0.2, 'nUMIs': 500, 'detected_genes': 250}, doublets_method='scrublet') for the CPU/CPU–GPU pipelines; omit doublets_method on pure GPU where Scrublet is not yet supported.
    • Review the returned AnnData summary to confirm doublet rates and QC thresholds; advise adjusting cut-offs for different species or sequencing depths.
  4. Store raw counts before transformations
    • Call ov.utils.store_layers(adata, layers='counts') immediately after QC so the original counts remain accessible for later recovery and comparison.
  5. Normalise and select HVGs
    • Use ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=2000, target_sum=5e5) to apply shift-log normalisation followed by Pearson residual HVG detection (set target_sum=None on GPU, which keeps defaults).
Related skills
Installs
44
GitHub Stars
985
First Seen
Jan 26, 2026