Bulk RNA-seq DESeq2 analysis with omicverse

Overview

Use this skill when a user wants to reproduce the DESeq2 workflow showcased in t_deseq2.ipynb. It covers loading raw featureCounts matrices, mapping Ensembl IDs to symbols, running PyDESeq2 via ov.bulk.pyDEG, and exploring downstream enrichment plots.

Instructions

Import and format the expression matrix
- Call import omicverse as ov and ov.style() to standardise visuals.
- Read tab-separated count data from featureCounts using ov.io.read(..., index_col=0, header=1).
- Strip trailing .bam from column names with [c.split('/')[-1].replace('.bam', '') for c in data.columns].
Map gene identifiers
- Ensure the appropriate mapping pair exists by running ov.utils.download_geneid_annotation_pair().
- Replace gene_id with gene symbols using ov.bulk.Matrix_ID_mapping(data, 'genesets/pair_<GENOME>.tsv').
Initialise the DEG object
- Create dds = ov.bulk.pyDEG(data) from the mapped counts.
- Resolve duplicate gene names with dds.drop_duplicates_index() and confirm success in logs.
Define contrasts and run DESeq2
- Collect sample labels into treatment_groups and control_groups lists that match column names exactly.
- Execute dds.deg_analysis(treatment_groups, control_groups, method='DEseq2') to invoke PyDESeq2.

bulk-rna-seq-deseq2-analysis-with-omicverse

Bulk RNA-seq DESeq2 analysis with omicverse

Overview

Instructions

More from starlitnightly/omicverse

single-cell-downstream-analysis

single-cell-annotation-skills-with-omicverse

single-cell-preprocessing-with-omicverse

single-cell-multi-omics-integration

data-export-pdf

data-viz-plots