AnnData — Annotated Data Matrices for Single-Cell Genomics

Overview

AnnData provides the standard data structure for single-cell genomics in the scverse ecosystem. It stores an observations-by-variables matrix (X) alongside cell metadata (obs), gene metadata (var), layers, embeddings (obsm/varm), graphs (obsp/varp), and unstructured metadata (uns). Supports sparse matrices, H5AD/Zarr storage, backed mode for large files, and integration with Scanpy, scvi-tools, and Muon.

When to Use

Constructing annotated matrices from raw count data with cell/gene metadata
Reading/writing .h5ad or .zarr files for single-cell experiments
Subsetting cells by quality metrics, gene sets, or metadata conditions
Concatenating multiple experimental batches with consistent metadata
Storing multiple data layers (raw counts, normalized, scaled) in one object
Working with large datasets exceeding RAM (backed mode, lazy concatenation)
Preparing data for Scanpy or scvi-tools pipelines
For single-cell analysis (clustering, DE, visualization), use scanpy instead
For probabilistic models, use scvi-tools instead

anndata-data-structure

AnnData — Annotated Data Matrices for Single-Cell Genomics

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

esm-protein-language-model

matchms-spectral-matching

chembl-database-bioactivity