ESM — Protein Language Models

Overview

ESM (Evolutionary Scale Modeling) provides pretrained protein language models for generative protein design and representation learning. ESM3 is a multimodal generative model conditioned on sequence, structure, and function simultaneously. ESM C is an efficient embedding model optimized for extracting protein representations for downstream ML tasks.

When to Use

Generating novel protein sequences conditioned on desired structure or function
Extracting fixed-length embeddings from protein sequences for classification, clustering, or regression
Predicting 3D structure from amino acid sequence
Inverse folding: designing sequences that fold into a target structure
Annotating proteins with functional keywords (GO terms, EC numbers)
Comparing protein similarity via embedding distance instead of sequence alignment
Chain-of-thought protein design: iterative refinement of sequence/structure/function
For traditional physics-based structure prediction, use AlphaFold instead
For sequence alignment and homology search, use BLAST/HMMER via BioPython instead

esm-protein-language-model

ESM — Protein Language Models

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

matchms-spectral-matching

chembl-database-bioactivity

biopython-sequence-analysis