mouse-phenome-database
mouse-phenome-database
Overview
The Mouse Phenome Database (MPD) at the Jackson Laboratory catalogs standardized phenotype measurements across inbred, recombinant inbred, and collaborative cross mouse strains. It aggregates data from 700+ projects covering 40+ phenotype categories including body composition, metabolic parameters, cardiovascular, behavioral, and hematological measures. The REST API at https://phenome.jax.org/api provides programmatic access to strain summaries, individual animal data, and measurement protocols. No authentication is required; data is freely available under CC-BY-4.0.
When to Use
- Identifying which inbred strains show extreme phenotypes (highest/lowest body weight, glucose, blood pressure) for selection as experimental models
- Retrieving phenotype data for QTL analysis using BXD, AXB, or DO panel strains
- Comparing strain means and distributions across metabolic traits for a hypothesis about genetic background effects
- Finding published MPD projects measuring a specific trait category (e.g., anxiety behavior, bone density, immune cell counts)
- Downloading individual-level measurement data for statistical modeling or power calculations
- Use
monarch-databaseinstead when you need disease-gene-phenotype knowledge graph associations (gene ontology, HPO phenotypes, human disease links) - Use
ensembl-databaseinstead for genomic coordinate, transcript, and gene annotation lookups for specific mouse genes
Prerequisites
More from jaechang-hits/sciagent-skills
scientific-brainstorming
Structured ideation methods: SCAMPER, Six Thinking Hats, Morphological Analysis, TRIZ, Biomimicry, plus more. Decision framework for picking methods by challenge type (stuck, improving, systematic exploration, contradiction). Use when generating research ideas or exploring interdisciplinary connections.
12snakemake-workflow-engine
Python-based workflow management system for reproducible, scalable pipelines. Define rules with file-based dependencies; Snakemake automatically determines the execution order and parallelism. Supports local, SLURM, LSF, AWS, and Google Cloud execution via profiles; per-rule conda/Singularity environments. Use for bioinformatics NGS pipelines, ML training workflows, and any multi-step file-processing analysis. Use Nextflow instead for Groovy-based dataflow pipelines or when nf-core ecosystem integration is required.
11esm-protein-language-model
Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and protein embeddings. Use when designing novel proteins, extracting sequence representations for downstream ML, or predicting structure from sequence. Local GPU or EvolutionaryScale Forge cloud API. For traditional structure prediction use AlphaFold; for small-molecule cheminformatics use RDKit.
11biopython-sequence-analysis
Biopython sequence analysis: parse FASTA/FASTQ/GenBank/GFF (SeqIO), NCBI Entrez (esearch/efetch/elink), remote/local BLAST, pairwise/MSA alignment (PairwiseAligner, MUSCLE/ClustalW), phylogenetic trees (Phylo). Use for gene family studies, phylogenomics, comparative genomics, NCBI pipelines. For PCR/restriction/cloning use biopython-molecular-biology; for SAM/BAM use pysam.
11shap-model-explainability
>-
11archs4-database
Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).
11