imaging-data-commons
Installation
SKILL.md
NCI Imaging Data Commons
Overview
NCI Imaging Data Commons (IDC) is NCI's cloud-based repository for cancer imaging data, hosting 50+ TB of publicly accessible DICOM images spanning radiology (CT, MRI, PET) and pathology (whole slide images) across 100+ collections. All data is hosted on Google Cloud Storage and BigQuery, enabling SQL queries over DICOM metadata without downloading. IDC integrates with Google Colab and BigQuery, making large-scale imaging research accessible without local storage.
When to Use
- Searching for publicly available cancer imaging datasets by modality, cancer type, or anatomical site
- Downloading DICOM image series for model training (segmentation, classification, detection)
- Querying DICOM metadata at scale using SQL (BigQuery) without downloading the full dataset
- Exploring available imaging collections before committing to a full download
- Accessing pathology whole-slide images (WSI) and radiology scans from TCIA collections
- Building reproducible imaging ML pipelines with versioned public datasets
- For local DICOM file processing use
pydicom-medical-imaging; for WSI preprocessing usehistolab