CNVkit Copy Number Analysis

Overview

CNVkit detects somatic copy number variants (CNVs) from whole-exome sequencing (WES), whole-genome sequencing (WGS), or targeted panel BAM files. It calculates read depth in both on-target (capture) bins and off-target (antitarget) bins, corrects for GC bias and library depth, segments the log2 copy ratio profile with circular binary segmentation (CBS) or a hidden Markov model (HMM), and calls amplifications and deletions. CNVkit provides both a CLI (cnvkit.py) and a Python API (cnvlib) for integration into analysis pipelines, and produces scatter plots, chromosome diagrams, heatmaps, and export files in VCF, BED, and SEG formats.

When to Use

Calling somatic copy number variants from tumor-normal paired exome (WES) or targeted panel sequencing
Detecting copy number alterations in tumor-only samples using a pooled normal reference
Running CNV analysis on whole-genome sequencing (WGS) data with the --method wgs mode
Estimating tumor purity and ploidy for samples where purity is unknown, to interpret copy ratio calls
Generating SEG format copy number files for GISTIC2, cBioPortal, or IGV visualization
Identifying focal amplifications (e.g., ERBB2, MYC) or homozygous deletions (e.g., CDKN2A, RB1)
Use GATK CNV (gatk DenoiseReadCounts / gatk ModelSegments) instead for deep WGS cohorts with large matched panel-of-normals (PoN); CNVkit is better suited for targeted/exome data
Use Control-FREEC instead when you need allele-frequency-based B-allele fraction modeling alongside CNV calling

cnvkit-copy-number

CNVkit Copy Number Analysis

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database