PLINK2 — GWAS and Population Genetics

Overview

PLINK2 is the high-performance successor to PLINK 1.9, designed for genome-wide association studies (GWAS) and population genetics analysis on large cohorts. It processes genotype data in PLINK binary format (.bed/.bim/.fam), VCF, and BGEN formats — performing sample and variant quality control (QC), kinship estimation, principal component analysis (PCA), and linear/logistic regression association testing. PLINK2 is 10–100× faster than PLINK 1.9 on most tasks due to multithreading and optimized I/O. Output files are compatible with downstream visualization (Manhattan/QQ plots) and meta-analysis tools.

When to Use

Running GWAS on a case-control or quantitative trait cohort after genotyping array QC
Performing sample QC: missingness, heterozygosity outliers, sex check, cryptic relatedness
Computing genome-wide LD pruning for PCA or relatedness estimation
Running PCA on genotype data to identify population stratification
Converting between PLINK binary, VCF, and BGEN formats
Filtering variants by MAF, HWE, missingness, or INFO score in VCF/imputed data
Use regenie or SAIGE instead for biobank-scale GWAS (>100k samples) requiring mixed model association to control for population structure
Use VCFtools as an alternative for VCF-specific population genetics statistics

plink2-gwas-analysis

PLINK2 — GWAS and Population Genetics

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database