verl-rl-training
SKILL.md
verl: Volcano Engine Reinforcement Learning for LLMs
verl is a flexible, efficient, and production-ready RL training library for large language models from ByteDance's Seed team. It implements the HybridFlow framework (EuroSys 2025) and powers models like Doubao-1.5-pro achieving O1-level performance on math benchmarks.
When to Use verl
Choose verl when you need:
- Production-ready RL training at scale (tested up to 671B parameters)
- Flexibility to swap backends (FSDP ↔ Megatron-LM ↔ vLLM ↔ SGLang)
- Support for multiple RL algorithms (PPO, GRPO, RLOO, REINFORCE++, DAPO)
- Multi-turn rollout with tool calling for agentic workflows
- Vision-language model RL training
Consider alternatives when:
- You need Megatron-native training → use slime or miles
- You want PyTorch-native abstractions with Monarch → use torchforge
- You only need simple SFT/DPO → use TRL or Axolotl