single-cell-preprocessing-with-omicverse
Single-cell preprocessing with omicverse
Overview
Follow this skill when a user needs to reproduce the preprocessing workflow from the omicverse notebooks t_preprocess.ipynb, t_preprocess_cpu.ipynb, and t_preprocess_gpu.ipynb. The tutorials operate on the 10x PBMC3k dataset and cover QC filtering, normalisation, highly variable gene (HVG) detection, dimensionality reduction, and downstream embeddings.
Instructions
- Set up the environment
- Import
omicverse as ovandscanpy as sc, then callov.plot_set(font_path='Arial')(orov.ov_plot_set()in legacy notebooks) to standardise figure styling. - Encourage
%load_ext autoreloadand%autoreload 2when iterating inside notebooks so code edits propagate without restarting the kernel.
- Import
- Prepare input data
- Download the PBMC3k filtered matrix from 10x Genomics (
pbmc3k_filtered_gene_bc_matrices.tar.gz) and extract it underdata/filtered_gene_bc_matrices/hg19/. - Load the matrix via
ov.io.read_10x_mtx(..., var_names='gene_symbols')and keep a writable folder likewrite/for exports.
- Download the PBMC3k filtered matrix from 10x Genomics (
- Perform quality control (QC)
- Run
ov.pp.qc(adata, tresh={'mito_perc': 0.2, 'nUMIs': 500, 'detected_genes': 250}, doublets_method='scrublet')for the CPU/CPU–GPU pipelines; omitdoublets_methodon pure GPU where Scrublet is not yet supported. - Review the returned AnnData summary to confirm doublet rates and QC thresholds; advise adjusting cut-offs for different species or sequencing depths.
- Run
- Store raw counts before transformations
- Call
ov.utils.store_layers(adata, layers='counts')immediately after QC so the original counts remain accessible for later recovery and comparison.
- Call
- Normalise and select HVGs
- Use
ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=2000, target_sum=5e5)to apply shift-log normalisation followed by Pearson residual HVG detection (settarget_sum=Noneon GPU, which keeps defaults).
- Use
More from starlitnightly/omicverse
single-cell-downstream-analysis
AUCell pathway scoring, metacell DEG, scDrug response, SCENIC regulons, cNMF programs, and NOCD community detection in OmicVerse.
50single-cell-annotation-skills-with-omicverse
Cell type annotation: SCSA, MetaTiME, CellVote consensus, CellMatch, GPTAnno, weighted KNN label transfer in OmicVerse.
49bulk-rna-seq-deseq2-analysis-with-omicverse
PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.
47single-cell-multi-omics-integration
Multi-omics integration: MOFA factor analysis, GLUE unpaired alignment, SIMBA batch correction, TOSICA label transfer, StaVIA trajectory. Covers scRNA+scATAC paired/unpaired workflows.
41data-export-pdf
Create professional PDF reports with text, tables, and embedded images using reportlab. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
38data-viz-plots
Publication-quality matplotlib/seaborn plots: scatter, heatmap, violin, bar, line, multi-panel figures. Works with ANY LLM provider.
37