Snakemake — Python Workflow Engine

Overview

Snakemake is a Python-based workflow management system that scales analyses from laptop to HPC and cloud. Workflows are defined as rules with explicit input/output file dependencies; Snakemake resolves the execution order automatically and runs independent steps in parallel. Rules can call shell commands, Python/R/Julia scripts, or inline Python. Per-rule conda or Singularity environments make workflows fully reproducible. Widely used in bioinformatics for NGS, genome assembly, and variant-calling pipelines.

When to Use

Building reproducible multi-step bioinformatics pipelines (align → sort → call variants → annotate)
Scaling the same workflow from local development to SLURM cluster without code changes
Processing multiple samples identically using wildcard-based rules
Managing dependencies automatically — only rerun steps whose inputs changed
Deploying per-rule conda or Singularity environments for tool isolation
Generating visual DAGs and dry-run previews before committing computational resources
Use Nextflow instead when you need Groovy DSL + dataflow channels, or when leveraging the nf-core community pipeline library
For simple shell loops, use bash scripts; Snakemake is worth the overhead only for 3+ sequential steps with branching
Use Prefect or Airflow instead for data engineering workflows with dynamic task graphs or time-based scheduling

snakemake-workflow-engine

Snakemake — Python Workflow Engine

Overview

When to Use

Prerequisites

More from jaechang-hits/sciagent-skills

scientific-brainstorming

gene-database

esm-protein-language-model

biopython-sequence-analysis

shap-model-explainability

archs4-database