nemo-mbridge-perf-parallelism-strategies
Installation
SKILL.md
Parallelism Strategy Selection Skill
For stable background on each parallelism type, see:
- @docs/parallelisms.md
- @skills/nemo-mbridge-perf-parallelism-strategies/card.yaml
Decision by Model Size
Dense models
| Model size | GPUs | Recommended starting point |
|---|---|---|
| < 1B | 1-8 | DP only |
| 1-10B | 8-16 | TP=2-4 + DP |
| 10-70B | 16-64 | TP=4-8 + PP=2-4 + DP |
| 70-175B | 64-256 | TP=8 + PP=4-8 + DP |
| 175-500B | 256-1024 | TP=8 + PP=8-16 + CP=2 + DP |