Arboreto GRN Inference

Overview

Arboreto infers gene regulatory networks (GRNs) from gene expression data using parallelized tree-based regression. For each target gene, it trains a regression model with all other genes (or a specified TF list) as features and emits TF-target-importance triplets. It provides two interchangeable algorithms -- GRNBoost2 (gradient boosting, fast) and GENIE3 (Random Forest, classic) -- sharing identical input/output formats. Computation is Dask-parallelized, scaling from laptop cores to HPC clusters.

When to Use

Inferring transcription factor-to-target gene regulatory relationships from bulk RNA-seq expression data
Building gene regulatory networks from single-cell RNA-seq count matrices (cells as rows, genes as columns)
Generating the adjacency matrix (Step 1) of the pySCENIC regulatory analysis pipeline
Comparing regulatory network structure across experimental conditions (e.g., control vs treatment)
Producing consensus regulatory networks by running inference across multiple random seeds
Validating GRN results by comparing GRNBoost2 and GENIE3 outputs on the same dataset
For downstream regulon identification and activity scoring, use arboreto output with pySCENIC
For single-cell preprocessing (QC, normalization, clustering) before GRN inference, use scanpy-scrna-seq

arboreto-grn-inference

Arboreto GRN Inference

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

snakemake-workflow-engine

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database