datamol-cheminformatics

Installation
SKILL.md

Datamol Cheminformatics Toolkit

Overview

Datamol provides a lightweight, Pythonic abstraction layer over RDKit for molecular cheminformatics. It simplifies common drug discovery operations — SMILES parsing, standardization, descriptors, fingerprints, clustering, scaffolds, conformers, and visualization — with sensible defaults, built-in parallelization, and cloud storage support via fsspec. All molecular objects are native rdkit.Chem.Mol instances, ensuring full RDKit compatibility.

When to Use

  • Parsing, validating, and standardizing molecular structures from SMILES, SDF, or other formats
  • Computing molecular descriptors and fingerprints for ML featurization
  • Similarity searching and diversity selection from compound libraries
  • Clustering compounds by structural similarity (Butina clustering)
  • Scaffold analysis and scaffold-based train/test splitting for ML
  • BRICS/RECAP molecular fragmentation for fragment-based design
  • 3D conformer generation and analysis
  • Visualizing molecules as grids with alignment and highlighting
  • Batch processing molecular datasets with parallelization
  • For quick gene lookups use gget instead; for advanced substructure queries or custom fingerprints, use RDKit directly
Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026