paddle-distributed

Installation

SKILL.md

Paddle 分布式训练、SOT 动转静与 Python-C++ 互操作

分布式范式速查

范式	核心思想	通信原语
Data Parallel	复制模型，切分数据，AllReduce 梯度	AllReduce
Group Sharded (ZeRO)	Stage1 切 optimizer / Stage2 + 切 grad / Stage3 + 切 weight	Broadcast, ReduceScatter, AllGather
Model Parallel (Tensor)	Column Parallel 切权重列 / Row Parallel 切权重行	AllReduce / AllGather
Pipeline Parallel	F-then-B / 1F1B 交错前反向	Send / Recv (P2P)
Sequence Parallel	沿 sequence 维度切分 LayerNorm/Dropout	AllGather / ReduceScatter

三种编程范式：手动 (fleet.meta_parallel)、半自动动态图 (ProcessMesh + shard_tensor)、半自动静态图 (auto_parallel.Engine)。

SOT 架构速查

Python Frame
  │

Related skills

More from pfcclab/paddle-skills

paddle-pull-request
|
30
paddle-debug
专注于在 Paddle 代码库中定位问题并输出高质量调试报告的调试流程与技巧；代码修复是在结论充分后的后续步骤。遇到 Paddle 框架、算子、训练脚本或分布式训练相关问题时，优先使用本 skill 规划调试与报告输出。
24
fastdeploy-pull-request
|
20
paddle-pir-cinn
Use when working with Paddle's new IR system (PIR) or CINN compiler: understanding SSA-based Program structure, Dialect/Type/Attribute design, writing or debugging Passes, tracing the CINN compilation pipeline from GroupOp to CUDA kernel, or translating legacy ProgramDesc to PIR.
5
paddle-phi-kernel
Use when working with Paddle's PHI kernel system: registering new kernels, debugging kernel selection/dispatch, understanding code auto-generation from YAML, or implementing operator decomposition via the combination mechanism.
5
paddle-eager-graph
Use when navigating Paddle eager-mode (dynamic graph) source code, tracing forward/backward execution, debugging autograd issues, understanding PyLayer, or investigating complex-valued gradient computation. Covers Python API to C++ kernel call chain, backward graph topology sort, and inplace version tracking.
5

Installs

4

Repository

pfcclab/paddle-skills

GitHub Stars

2

First Seen

Mar 13, 2026

Security Audits

Gen Agent Trust HubPass