nemo-mbridge-perf-activation-recompute
Installation
SKILL.md
Activation Recompute
Stable docs: @docs/training/activation-recomputation.md Card: @skills/nemo-mbridge-perf-activation-recompute/card.yaml
What It Is
Activation recompute trades GPU compute for memory by discarding intermediate activations during the forward pass and recomputing them during backward. Megatron Bridge supports two granularities:
| Granularity | What you specify | What gets recomputed | Memory savings | Compute cost |
|---|---|---|---|---|
selective |
recompute_modules list (e.g. core_attn, mlp) |
specific submodules within each layer | moderate (module-dependent) | low to high |
full |
recompute_num_layers + recompute_method |
entire transformer layers (N layers) | strongest | highest |
Note: MCore names these "selective" (submodule-level) vs "full" (layer-level).
"Full" means recomputing full layers, not the full model — you still choose
how many layers via recompute_num_layers.