gatk-variant-calling

Installation
SKILL.md

GATK — Germline Variant Calling Pipeline

Overview

GATK (Genome Analysis Toolkit) implements the GATK Best Practices workflow for calling SNPs and indels from Illumina WGS and WES data. The pipeline runs HaplotypeCaller per sample (producing GVCF files), consolidates GVCFs with GenomicsDBImport, performs joint genotyping with GenotypeGVCFs, and filters variants with VQSR (Variant Quality Score Recalibration) or hard filters. GATK requires BWA-MEM2-aligned, duplicate-marked, and base quality score recalibrated (BQSR) BAM files as input. It integrates with Picard tools, samtools, and bcftools for pre- and post-processing. The GATK4 workflow is the NIH/ENCODE standard for germline variant calling in research and clinical genomics.

When to Use

  • Calling germline SNPs and indels from WGS or WES samples for population genetics or clinical variant analysis
  • Running joint genotyping across multiple samples for cohort-scale studies (families, case-control)
  • Applying base quality score recalibration (BQSR) to improve variant calling accuracy before HaplotypeCaller
  • Generating GVCF files for scalable cohort expansion: add new samples without reprocessing existing ones
  • Producing variant call sets for downstream annotation with Ensembl VEP, ANNOVAR, or SnpEff
  • Use DeepVariant (Google) instead for a faster deep-learning approach with comparable accuracy
  • Use bcftools call instead for rapid variant calling without assembly-based local realignment

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026