torch-pipeline-parallelism

Installation
SKILL.md

Torch Pipeline Parallelism

Overview

This skill provides guidance for implementing pipeline parallelism in PyTorch for distributed model training. Pipeline parallelism partitions a model across multiple devices/ranks, where each rank processes a subset of layers and communicates activations/gradients with neighboring ranks.

Key Concepts

Pipeline Parallelism Patterns

  • AFAB (All-Forward-All-Backward): Process all microbatch forwards first, cache activations, then process all backwards. This is the most common pattern for pipeline parallelism.
  • 1F1B (One-Forward-One-Backward): Interleave forward and backward passes for better memory efficiency but more complex scheduling.

Critical Components

  1. Model Partitioning: Divide model layers across ranks
  2. Activation Communication: Send/receive hidden states between ranks
  3. Gradient Communication: Send/receive gradients during backward pass
  4. Activation Caching: Store activations for backward pass computation
Related skills

More from letta-ai/skills

Installs
34
Repository
letta-ai/skills
GitHub Stars
97
First Seen
Jan 24, 2026