arboreto-grn-inference

Installation
SKILL.md

Arboreto GRN Inference

Overview

Arboreto infers gene regulatory networks (GRNs) from gene expression data using parallelized tree-based regression. For each target gene, it trains a regression model with all other genes (or a specified TF list) as features and emits TF-target-importance triplets. It provides two interchangeable algorithms -- GRNBoost2 (gradient boosting, fast) and GENIE3 (Random Forest, classic) -- sharing identical input/output formats. Computation is Dask-parallelized, scaling from laptop cores to HPC clusters.

When to Use

  • Inferring transcription factor-to-target gene regulatory relationships from bulk RNA-seq expression data
  • Building gene regulatory networks from single-cell RNA-seq count matrices (cells as rows, genes as columns)
  • Generating the adjacency matrix (Step 1) of the pySCENIC regulatory analysis pipeline
  • Comparing regulatory network structure across experimental conditions (e.g., control vs treatment)
  • Producing consensus regulatory networks by running inference across multiple random seeds
  • Validating GRN results by comparing GRNBoost2 and GENIE3 outputs on the same dataset
  • For downstream regulon identification and activity scoring, use arboreto output with pySCENIC
  • For single-cell preprocessing (QC, normalization, clustering) before GRN inference, use scanpy-scrna-seq

Prerequisites

Related skills

More from jaechang-hits/sciagent-skills

Installs
9
GitHub Stars
152
First Seen
Mar 16, 2026