moe-training

Originally fromovachiever/droid-tings
Installation
SKILL.md

MoE Training: Mixture of Experts

When to Use This Skill

Use MoE Training when you need to:

  • Train larger models with limited compute (5× cost reduction vs dense models)
  • Scale model capacity without proportional compute increase
  • Achieve better performance per compute budget than dense models
  • Specialize experts for different domains/tasks/languages
  • Reduce inference latency with sparse activation (only 13B/47B params active in Mixtral)
  • Implement SOTA models like Mixtral 8x7B, DeepSeek-V3, Switch Transformers

Notable MoE Models: Mixtral 8x7B (Mistral AI), DeepSeek-V3, Switch Transformers (Google), GLaM (Google), NLLB-MoE (Meta)

Installation

# DeepSpeed with MoE support
pip install deepspeed>=0.6.0
Related skills

More from davila7/claude-code-templates

Installs
331
GitHub Stars
27.2K
First Seen
Jan 21, 2026