pytdc-therapeutics-data-commons

Installation
SKILL.md

PyTDC (Therapeutics Data Commons)

Overview

PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery. It organizes therapeutics data into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions), and generation (molecule design, retrosynthesis). All datasets come with standardized splits, evaluation metrics, and molecular oracles.

When to Use

  • Loading curated ADME, toxicity, or bioactivity datasets for ML model training
  • Benchmarking drug discovery models with standardized 5-seed evaluation protocols
  • Predicting drug-target or drug-drug interactions with proper cold-split evaluation
  • Generating novel molecules and scoring them with molecular oracles (QED, SA, DRD2, GSK3B)
  • Accessing scaffold-based or temporal train/test splits for pharmaceutical ML
  • Converting molecular representations (SMILES to PyG graphs, ECFP fingerprints, SELFIES)
  • For chemical database queries (compound search, bioactivity), use chembl-database-bioactivity instead
  • For molecular featurization beyond format conversion, use molfeat instead

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026