uv-moe-training

Installation

SKILL.md

MoE Training: Mixture of Experts

When to Use This Skill

Use MoE Training when you need to:

Train larger models with limited compute (5× cost reduction vs dense models)
Scale model capacity without proportional compute increase
Achieve better performance per compute budget than dense models
Specialize experts for different domains/tasks/languages
Reduce inference latency with sparse activation (only 13B/47B params active in Mixtral)
Implement SOTA models like Mixtral 8x7B, DeepSeek-V3, Switch Transformers

Notable MoE Models: Mixtral 8x7B (Mistral AI), DeepSeek-V3, Switch Transformers (Google), GLaM (Google), NLLB-MoE (Meta)

Installation

# DeepSpeed with MoE support
pip install deepspeed>=0.6.0

Installs

5

Repository

First Seen

Feb 27, 2026

Security Audits

Gen Agent Trust HubPass

uv-moe-training — uv-xiao/pkbllm