pride-database
Installation
SKILL.md
PRIDE Database
Overview
The PRIDE Archive (ProteomicsIDEntifications database) at EBI is the world's largest public repository of mass spectrometry-based proteomics data, containing 30,000+ datasets from peer-reviewed publications. The REST API v2 at https://www.ebi.ac.uk/pride/ws/archive/v2/ provides project discovery, file listing, peptide/PSM identification retrieval, and protein-level evidence — all without authentication. Data types include RAW files, peak lists (mzML, MGF), PRIDE XML result files, and processed identification tables.
When to Use
- Finding published proteomics datasets by organism, tissue, disease keyword, or instrument type for meta-analysis or benchmarking
- Downloading raw mass spectrometry data (RAW, mzML) or peak files (MGF) from a specific PRIDE project accession
- Retrieving peptide identification tables with sequence, modification, and confidence score for a project
- Querying protein-level evidence (PSMs, unique peptides) for a protein of interest across PRIDE projects
- Checking whether a protein has experimental proteomics evidence in a specific tissue or disease context
- Building training datasets of confident peptide-spectrum matches (PSMs) for proteomics ML applications
- For protein domain and family classification use
interpro-database; PRIDE provides experimental identification evidence only - For protein sequences, Swiss-Prot annotations, and ID mapping use
uniprot-protein-database