uv-miles-rl-training
Installation
SKILL.md
miles: Enterprise-Grade RL for Large-Scale Model Training
miles is a high-performance, enterprise-ready RL framework optimized for large-scale model post-training. Built as a production fork of slime, it addresses critical challenges in MoE training stability, low-precision training, and train-inference alignment.
When to Use miles
Choose miles when you need:
- Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE)
- FP8 or INT4 quantization-aware training
- Bit-wise identical train-inference alignment
- Speculative RL for maximum throughput
- Production stability with enterprise support
Consider alternatives when:
- You want the research-grade original → use slime
- You need flexible backend swapping → use verl
- You want PyTorch-native abstractions → use torchforge