GATK — Germline Variant Calling Pipeline

Overview

GATK (Genome Analysis Toolkit) implements the GATK Best Practices workflow for calling SNPs and indels from Illumina WGS and WES data. The pipeline runs HaplotypeCaller per sample (producing GVCF files), consolidates GVCFs with GenomicsDBImport, performs joint genotyping with GenotypeGVCFs, and filters variants with VQSR (Variant Quality Score Recalibration) or hard filters. GATK requires BWA-MEM2-aligned, duplicate-marked, and base quality score recalibrated (BQSR) BAM files as input. It integrates with Picard tools, samtools, and bcftools for pre- and post-processing. The GATK4 workflow is the NIH/ENCODE standard for germline variant calling in research and clinical genomics.

When to Use

Calling germline SNPs and indels from WGS or WES samples for population genetics or clinical variant analysis
Running joint genotyping across multiple samples for cohort-scale studies (families, case-control)
Applying base quality score recalibration (BQSR) to improve variant calling accuracy before HaplotypeCaller
Generating GVCF files for scalable cohort expansion: add new samples without reprocessing existing ones
Producing variant call sets for downstream annotation with Ensembl VEP, ANNOVAR, or SnpEff
Use DeepVariant (Google) instead for a faster deep-learning approach with comparable accuracy
Use bcftools call instead for rapid variant calling without assembly-based local realignment

gatk-variant-calling

GATK — Germline Variant Calling Pipeline

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

esm-protein-language-model

matchms-spectral-matching

chembl-database-bioactivity