datasets-loading

Installation
SKILL.md

OmicVerse Built-in Datasets

ov.datasets provides 30+ ready-to-use datasets with automatic download, caching, and fallback to mock data. Use these instead of manually downloading files or relying on scanpy.datasets.

When to Use This Module

  • Tutorials/demos: Load standard benchmarks (PBMC3k, Paul15, dentate gyrus) with one function call
  • Testing pipelines: Use create_mock_dataset() to generate synthetic data without downloads
  • Gene set analysis: Use predefined_signatures for curated GMT gene sets (cell cycle, gender, mitochondrial, tissue-specific)
  • Velocity workflows: Load pre-formatted datasets with spliced/unspliced layers

Dataset Catalog

Single-Cell

Function Cells Genes Description
ov.datasets.pbmc3k() 2,700 32,738 10x PBMC3k (raw or processed)
ov.datasets.pbmc8k() ~8,000 10x PBMC 8k
Related skills
Installs
3
GitHub Stars
985
First Seen
Mar 30, 2026