single-cell-annotation-skills-with-omicverse
Installation
SKILL.md
Single-cell annotation skills with omicverse
Overview
Use this skill to reproduce and adapt the single-cell annotation playbook captured in omicverse tutorials: SCSA t_cellanno.ipynb, MetaTiME t_metatime.ipynb, CellVote t_cellvote.md & t_cellvote_pbmc3k.ipynb, CellMatch t_cellmatch.ipynb, GPTAnno t_gptanno.ipynb, and label transfer t_anno_trans.ipynb. Each section below highlights required inputs, training/inference steps, and how to read the outputs.
Instructions
-
SCSA automated cluster annotation
- Data requirements: PBMC3k raw counts from 10x Genomics (
pbmc3k_filtered_gene_bc_matrices.tar.gz) or the processedsample/rna.h5ad. Download instructions are embedded in the notebook; unpack todata/filtered_gene_bc_matrices/hg19/. Ensure an SCSA SQLite database is available (e.g.pySCSA_2024_v1_plus.dbfrom the Figshare/Drive links listed in the tutorial) and pointmodel_pathto its location. - Preprocessing & model fit: Load with
ov.io.read_10x_mtx, run QC (ov.pp.qc), normalization and HVG selection (ov.pp.preprocess), scaling (ov.pp.scale), PCA (ov.pp.pca), neighbors, Leiden clustering, and compute rank markers (sc.tl.rank_genes_groups). Instantiatescsa = ov.single.pySCSA(...)choosingtarget='cellmarker'or'panglaodb', tissue scope, and thresholds (foldchange,pvalue). - Inference & interpretation: Call
scsa.cell_anno(clustertype='leiden', result_key='scsa_celltype_cellmarker')orscsa.cell_auto_annoto append predictions toadata.obs. Compare to manual marker-based labels viaov.pl.embeddingorsc.pl.dotplot, inspect marker dictionaries (ov.single.get_celltype_marker), and query supported tissues withscsa.get_model_tissue(). Use the ROI/ROE helpers (ov.utils.roe,ov.utils.plot_cellproportion) to validate abundance trends.
- Data requirements: PBMC3k raw counts from 10x Genomics (
-
MetaTiME tumour microenvironment states
- Data requirements: Batched TME AnnData with an scVI latent embedding. The tutorial uses
TiME_adata_scvi.h5adfrom Figshare (https://figshare.com/ndownloader/files/41440050). If starting from counts, run scVI (scvi.model.SCVI) first to populateadata.obsm['X_scVI']. - Preprocessing & model fit: Optionally subset to non-malignant cells via
adata.obs['isTME']. Rebuild neighbors on the latent representation (sc.pp.neighbors(adata, use_rep="X_scVI")) and embed with umap (adata.obsm['X_umap'] = ov.pp.umap(...)). InitialiseTiME_object = ov.single.MetaTiME(adata, mode='table')and, if finer granularity is desired, over-cluster withTiME_object.overcluster(resolution=8, clustercol='overcluster'). - Inference & interpretation: Run
TiME_object.predictTiME(save_obs_name='MetaTiME')to assign minor states andMajor_MetaTiME. Visualise usingTiME_object.plotorsc.pl.embedding. Interpret the outputs by comparing cluster-level distributions and confirming that MetaTiME and Major_MetaTiME columns align with expected niches.
- Data requirements: Batched TME AnnData with an scVI latent embedding. The tutorial uses