remap-database
ReMap Database
Overview
ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at https://remap2022.univ-amu.fr/api/ and bulk BED file downloads; no authentication required.
When to Use
- Finding all TFs with ChIP-seq peaks overlapping a genomic region of interest (e.g., a GWAS SNP locus or candidate enhancer)
- Retrieving TF peaks near a gene's transcription start site to map its proximal regulatory landscape
- Listing all TFs available in ReMap for human or mouse with their peak and dataset counts
- Filtering ChIP-seq peaks by regulatory biotype annotation (promoter, enhancer, exon, intron, intergenic) for a TF in a specific cell line
- Downloading a BED file of all binding peaks for a TF across all cell types for offline analysis
- Identifying co-binding TFs at a locus by querying all overlapping peaks and grouping by TF name
- Use
jaspar-databaseinstead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locations - For ENCODE-specific regulatory tracks and accessibility data use
encode-database; ReMap aggregates TF binding peaks from many sources including ENCODE
Prerequisites
More from jaechang-hits/sciagent-skills
scientific-brainstorming
Structured ideation methods: SCAMPER, Six Thinking Hats, Morphological Analysis, TRIZ, Biomimicry, plus more. Decision framework for picking methods by challenge type (stuck, improving, systematic exploration, contradiction). Use when generating research ideas or exploring interdisciplinary connections.
12snakemake-workflow-engine
Python-based workflow management system for reproducible, scalable pipelines. Define rules with file-based dependencies; Snakemake automatically determines the execution order and parallelism. Supports local, SLURM, LSF, AWS, and Google Cloud execution via profiles; per-rule conda/Singularity environments. Use for bioinformatics NGS pipelines, ML training workflows, and any multi-step file-processing analysis. Use Nextflow instead for Groovy-based dataflow pipelines or when nf-core ecosystem integration is required.
11esm-protein-language-model
Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and protein embeddings. Use when designing novel proteins, extracting sequence representations for downstream ML, or predicting structure from sequence. Local GPU or EvolutionaryScale Forge cloud API. For traditional structure prediction use AlphaFold; for small-molecule cheminformatics use RDKit.
11biopython-sequence-analysis
Biopython sequence analysis: parse FASTA/FASTQ/GenBank/GFF (SeqIO), NCBI Entrez (esearch/efetch/elink), remote/local BLAST, pairwise/MSA alignment (PairwiseAligner, MUSCLE/ClustalW), phylogenetic trees (Phylo). Use for gene family studies, phylogenomics, comparative genomics, NCBI pipelines. For PCR/restriction/cloning use biopython-molecular-biology; for SAM/BAM use pysam.
11shap-model-explainability
>-
11archs4-database
Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).
11