NCI Imaging Data Commons

Overview

NCI Imaging Data Commons (IDC) is NCI's cloud-based repository for cancer imaging data, hosting 50+ TB of publicly accessible DICOM images spanning radiology (CT, MRI, PET) and pathology (whole slide images) across 100+ collections. All data is hosted on Google Cloud Storage and BigQuery, enabling SQL queries over DICOM metadata without downloading. IDC integrates with Google Colab and BigQuery, making large-scale imaging research accessible without local storage.

When to Use

Searching for publicly available cancer imaging datasets by modality, cancer type, or anatomical site
Downloading DICOM image series for model training (segmentation, classification, detection)
Querying DICOM metadata at scale using SQL (BigQuery) without downloading the full dataset
Exploring available imaging collections before committing to a full download
Accessing pathology whole-slide images (WSI) and radiology scans from TCIA collections
Building reproducible imaging ML pipelines with versioned public datasets
For local DICOM file processing use pydicom-medical-imaging; for WSI preprocessing use histolab

imaging-data-commons

NCI Imaging Data Commons

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database