PyHealth

Overview

PyHealth provides an end-to-end pipeline for healthcare ML on EHR data: data loading → medical code processing → patient-level dataset construction → model training → evaluation. It natively supports MIMIC-III, MIMIC-IV, eICU-CRD, and OMOP-CDM structured databases, and handles the idiosyncratic data formats of each. Medical codes (ICD-9, ICD-10, ATC, NDC, SNOMED) are organized in a hierarchical code system that supports code-level embedding and cross-ontology mapping. Pre-built tasks — mortality prediction, drug recommendation, readmission, length-of-stay, diagnosis code prediction — can be instantiated in a few lines. Custom tasks follow a standardized interface.

When to Use

Training clinical outcome prediction models (mortality, readmission, LOS) from MIMIC-III or MIMIC-IV
Building drug recommendation or drug interaction prediction models using ATC code hierarchy
Processing OMOP-CDM formatted data from institutional EHR systems for ML
Using pretrained clinical models (RETAIN, GRASP, MedBERT) as baselines on healthcare benchmarks
Constructing patient visit sequences with temporal structure for RNN/Transformer models
Evaluating clinical prediction models with appropriate metrics (AUROC, AUPRC, F1, Jaccard)
Use FIDDLE for pure EHR preprocessing without ML; use clinical-longformer for clinical note NLP

pyhealth

PyHealth

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

esm-protein-language-model

matchms-spectral-matching

chembl-database-bioactivity