geo-database

Installation
SKILL.md

GEO Gene Expression Omnibus Database

Overview

GEO (Gene Expression Omnibus) is NCBI's public repository for high-throughput functional genomics data, containing 200,000+ datasets (series) from microarrays, RNA-seq, ChIP-seq, methylation, and proteomics experiments. GEOparse provides a Python interface for downloading and parsing GEO records (GSE series, GPL platforms, GSM samples) while NCBI E-utilities enables programmatic search across GEO's metadata.

When to Use

  • Searching for publicly available gene expression datasets by organism, tissue, disease, or experimental condition
  • Downloading and parsing a specific GEO series (GSE) with its expression matrix and sample metadata
  • Extracting sample annotation tables (e.g., treatment groups, clinical covariates) for meta-analysis
  • Loading microarray expression data (GPL platform-annotated probes) into a tidy DataFrame
  • Retrieving all GEO experiments associated with a gene or pathway of interest
  • Building automated pipelines that download and process GEO datasets for downstream analysis
  • For single-cell RNA-seq data at scale, use cellxgene-census; for aligned reads, download FASTQ from ENA/SRA instead

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026