GEO Gene Expression Omnibus Database

Overview

GEO (Gene Expression Omnibus) is NCBI's public repository for high-throughput functional genomics data, containing 200,000+ datasets (series) from microarrays, RNA-seq, ChIP-seq, methylation, and proteomics experiments. GEOparse provides a Python interface for downloading and parsing GEO records (GSE series, GPL platforms, GSM samples) while NCBI E-utilities enables programmatic search across GEO's metadata.

When to Use

Searching for publicly available gene expression datasets by organism, tissue, disease, or experimental condition
Downloading and parsing a specific GEO series (GSE) with its expression matrix and sample metadata
Extracting sample annotation tables (e.g., treatment groups, clinical covariates) for meta-analysis
Loading microarray expression data (GPL platform-annotated probes) into a tidy DataFrame
Retrieving all GEO experiments associated with a gene or pathway of interest
Building automated pipelines that download and process GEO datasets for downstream analysis
For single-cell RNA-seq data at scale, use cellxgene-census; for aligned reads, download FASTQ from ENA/SRA instead

geo-database

GEO Gene Expression Omnibus Database

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database