lamindb-data-management

Installation
SKILL.md

LaminDB — Biological Data Management

Overview

LaminDB is an open-source data framework for biology that makes data queryable, traceable, and FAIR (Findable, Accessible, Interoperable, Reusable). It combines data lakehouse architecture, lineage tracking, biological ontology validation, and a unified Python API for managing biological datasets from raw files to annotated, curated artifacts.

When to Use

  • Managing and versioning biological datasets (scRNA-seq, spatial, flow cytometry, multi-modal)
  • Tracking computational lineage (which code produced which data)
  • Validating and curating data against biological ontologies (cell types, genes, tissues, diseases)
  • Building queryable data lakehouses across multiple experiments
  • Ensuring reproducibility with automatic environment and provenance capture
  • Integrating with workflow managers (Nextflow, Snakemake) or MLOps (W&B, MLflow)
  • Standardizing metadata with ontology-based annotation (Bionty)
  • For single-cell analysis pipelines (clustering, DE), use scanpy instead
  • For ontology lookups only without data management, use bionty directly

Prerequisites

Installs
23
GitHub Stars
193
First Seen
Mar 16, 2026
lamindb-data-management — jaechang-hits/sciagent-skills