Nextflow — Scalable Scientific Workflow Engine

Overview

Nextflow implements a dataflow programming model where processes (containerized execution units) consume and emit data through channels (asynchronous queues). This design enables implicit parallelization — processes run as soon as their input channels have data, without manual dependency management. Nextflow handles process orchestration across local machines, HPC clusters (SLURM, SGE, PBS), and cloud platforms (AWS Batch, Google Cloud Life Sciences, Azure Batch) by swapping a single configuration profile. The nf-core community provides 100+ validated Nextflow pipelines (RNA-seq, WGS, ChIP-seq, scRNA-seq) following best practices with automated testing.

When to Use

Building containerized bioinformatics pipelines that must run on HPC, AWS, and local environments without code changes
Using nf-core community pipelines (nf-core/rnaseq, nf-core/sarek, nf-core/chipseq) out of the box
Processing thousands of samples with implicit parallelization across a SLURM cluster
Writing pipelines where each step runs inside a Docker or Singularity container for reproducibility
Monitoring pipeline execution and resuming from checkpoints after failures with -resume
Use Snakemake instead for Python-native rule-based workflows where Python integration is prioritized
Use WDL/Cromwell instead for clinical genomics pipelines that require CWL/WDL standards compliance

nextflow-workflow-engine

Nextflow — Scalable Scientific Workflow Engine

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

snakemake-workflow-engine

esm-protein-language-model

matchms-spectral-matching

chembl-database-bioactivity