datasets-loading
OmicVerse Built-in Datasets
ov.datasets provides 30+ ready-to-use datasets with automatic download, caching, and fallback to mock data. Use these instead of manually downloading files or relying on scanpy.datasets.
When to Use This Module
- Tutorials/demos: Load standard benchmarks (PBMC3k, Paul15, dentate gyrus) with one function call
- Testing pipelines: Use
create_mock_dataset()to generate synthetic data without downloads - Gene set analysis: Use
predefined_signaturesfor curated GMT gene sets (cell cycle, gender, mitochondrial, tissue-specific) - Velocity workflows: Load pre-formatted datasets with spliced/unspliced layers
Dataset Catalog
Single-Cell
| Function | Cells | Genes | Description |
|---|---|---|---|
ov.datasets.pbmc3k() |
2,700 | 32,738 | 10x PBMC3k (raw or processed) |
ov.datasets.pbmc8k() |
~8,000 | — | 10x PBMC 8k |
More from starlitnightly/omicverse
single-cell-downstream-analysis
AUCell pathway scoring, metacell DEG, scDrug response, SCENIC regulons, cNMF programs, and NOCD community detection in OmicVerse.
49single-cell-annotation-skills-with-omicverse
Cell type annotation: SCSA, MetaTiME, CellVote consensus, CellMatch, GPTAnno, weighted KNN label transfer in OmicVerse.
48bulk-rna-seq-deseq2-analysis-with-omicverse
PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.
47single-cell-preprocessing-with-omicverse
Single-cell QC, normalization, HVG detection, PCA, neighbor graph, UMAP/tSNE embedding pipelines in OmicVerse (CPU/GPU).
43single-cell-multi-omics-integration
Multi-omics integration: MOFA factor analysis, GLUE unpaired alignment, SIMBA batch correction, TOSICA label transfer, StaVIA trajectory. Covers scRNA+scATAC paired/unpaired workflows.
40data-export-pdf
Create professional PDF reports with text, tables, and embedded images using reportlab. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
38