Pysam — Genomic File Toolkit

Overview

Pysam provides a Pythonic interface to htslib for reading, manipulating, and writing genomic data files. It handles SAM/BAM/CRAM alignments, VCF/BCF variants, and FASTA/FASTQ sequences with efficient region-based random access. Also exposes samtools and bcftools as callable Python functions.

When to Use

Reading and querying BAM/CRAM alignment files (region extraction, read filtering)
Analyzing VCF/BCF variant files (genotype access, variant filtering, annotation)
Extracting reference sequences from indexed FASTA files
Calculating per-base coverage and pileup statistics
Building custom bioinformatics pipelines that combine alignment + variant + sequence data
Quality control of NGS data (mapping quality, flag filtering, coverage)
For alignment from FASTQ (read mapping), use STAR, BWA, or minimap2 instead
For variant calling from BAM, use GATK or DeepVariant instead

pysam-genomic-files

Pysam — Genomic File Toolkit

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database