Single-cell annotation skills with omicverse

Overview

Use this skill to reproduce and adapt the single-cell annotation playbook captured in omicverse tutorials: SCSA t_cellanno.ipynb, MetaTiME t_metatime.ipynb, CellVote t_cellvote.md & t_cellvote_pbmc3k.ipynb, CellMatch t_cellmatch.ipynb, GPTAnno t_gptanno.ipynb, and label transfer t_anno_trans.ipynb. Each section below highlights required inputs, training/inference steps, and how to read the outputs.

Instructions

SCSA automated cluster annotation
- Data requirements: PBMC3k raw counts from 10x Genomics (pbmc3k_filtered_gene_bc_matrices.tar.gz) or the processed sample/rna.h5ad. Download instructions are embedded in the notebook; unpack to data/filtered_gene_bc_matrices/hg19/. Ensure an SCSA SQLite database is available (e.g. pySCSA_2024_v1_plus.db from the Figshare/Drive links listed in the tutorial) and point model_path to its location.
- Preprocessing & model fit: Load with ov.io.read_10x_mtx, run QC (ov.pp.qc), normalization and HVG selection (ov.pp.preprocess), scaling (ov.pp.scale), PCA (ov.pp.pca), neighbors, Leiden clustering, and compute rank markers (sc.tl.rank_genes_groups). Instantiate scsa = ov.single.pySCSA(...) choosing target='cellmarker' or 'panglaodb', tissue scope, and thresholds (foldchange, pvalue).
- Inference & interpretation: Call scsa.cell_anno(clustertype='leiden', result_key='scsa_celltype_cellmarker') or scsa.cell_auto_anno to append predictions to adata.obs. Compare to manual marker-based labels via ov.pl.embedding or sc.pl.dotplot, inspect marker dictionaries (ov.single.get_celltype_marker), and query supported tissues with scsa.get_model_tissue(). Use the ROI/ROE helpers (ov.utils.roe, ov.utils.plot_cellproportion) to validate abundance trends.
MetaTiME tumour microenvironment states
- Data requirements: Batched TME AnnData with an scVI latent embedding. The tutorial uses TiME_adata_scvi.h5ad from Figshare (https://figshare.com/ndownloader/files/41440050). If starting from counts, run scVI (scvi.model.SCVI) first to populate adata.obsm['X_scVI'].
- Preprocessing & model fit: Optionally subset to non-malignant cells via adata.obs['isTME']. Rebuild neighbors on the latent representation (sc.pp.neighbors(adata, use_rep="X_scVI")) and embed with umap (adata.obsm['X_umap'] = ov.pp.umap(...)). Initialise TiME_object = ov.single.MetaTiME(adata, mode='table') and, if finer granularity is desired, over-cluster with TiME_object.overcluster(resolution=8, clustercol='overcluster').
- Inference & interpretation: Run TiME_object.predictTiME(save_obs_name='MetaTiME') to assign minor states and Major_MetaTiME. Visualise using TiME_object.plot or sc.pl.embedding. Interpret the outputs by comparing cluster-level distributions and confirming that MetaTiME and Major_MetaTiME columns align with expected niches.

single-cell-annotation-skills-with-omicverse

Single-cell annotation skills with omicverse

Overview

Instructions