Biopython: Computational Molecular Biology Toolkit

Overview

Biopython is the standard open-source Python library for computational molecular biology, providing modular APIs for sequence handling, biological file parsing, NCBI database access, BLAST searches, protein structure analysis, and phylogenetics. It supports Python 3 and requires NumPy.

When to Use

Parse and convert biological file formats (FASTA, GenBank, FASTQ, PDB, mmCIF, PHYLIP)
Fetch sequences or publications from NCBI databases (GenBank, PubMed, Protein) programmatically
Run and parse BLAST searches (remote NCBI or local BLAST+)
Perform pairwise or multiple sequence alignments with custom scoring
Analyze 3D protein structures — distances, angles, DSSP, superimposition
Build and visualize phylogenetic trees from sequence alignments
Calculate sequence statistics (GC content, molecular weight, melting temperature)
Batch-process thousands of sequences with custom filtering logic
Use pysam instead for reading SAM/BAM/CRAM alignment files and working with mapped reads; use scikit-bio instead for advanced ecological diversity metrics

biopython-molecular-biology

Biopython: Computational Molecular Biology Toolkit

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database