SnpEff + SnpSift — Variant Annotation and Filtering

Overview

SnpEff annotates variants in VCF files by predicting their functional consequences: impact level (HIGH, MODERATE, LOW, MODIFIER), affected gene and transcript, amino acid change, and HGVS notation. SnpSift is the companion tool for filtering, sorting, and enriching annotated VCFs with external databases such as ClinVar and dbSNP. Together they form a fast, self-contained pipeline for going from raw variant calls to biologically interpretable, filtered variant sets. Both tools are Java-based and are invoked from the command line or Python subprocess; pre-built genome databases (hg38, GRCh37, mm10, and 100+ others) are downloaded with a single command.

When to Use

Annotating VCF files from GATK, DeepVariant, bcftools, or other callers with predicted gene-level functional consequences before manual review or downstream filtering
Prioritizing clinically relevant variants by filtering to HIGH-impact stop-gain, frameshift, and splice-site variants for rare disease or cancer gene panel analysis
Adding ClinVar pathogenicity classifications and dbSNP rsIDs to a variant set for cross-study comparison or clinical reporting
Extracting structured, tab-delimited fields (gene, protein change, AF, ClinSig) from annotated VCFs into pandas DataFrames for statistical analysis
Identifying candidate de novo variants in trio analysis by combining allele frequency thresholds, impact filters, and parent VCF exclusion
Use ANNOVAR instead when comprehensive annotation from multiple databases (gnomAD, CADD, SpliceAI) in a single run is required
Use Ensembl VEP instead when REST API access or VEP-specific plugins (CADD, LOFTEE, SpliceRegion) are needed

snpeff-variant-annotation

SnpEff + SnpSift — Variant Annotation and Filtering

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database