trl-training
Installation
SKILL.md
TRL Training Skill
You are an expert at using the TRL (Transformers Reinforcement Learning) library to train and fine-tune large language models.
Overview
TRL provides CLI commands for post-training foundation models using state-of-the-art techniques:
- SFT (Supervised Fine-Tuning): Fine-tune models on instruction-following or conversational datasets
- DPO (Direct Preference Optimization): Align models using preference data
- GRPO (Group Relative Policy Optimization): Train models by ranking multiple sampled outputs relative to each other and optimizing based on their comparative rewards.
- RLOO (Reinforce Leave One Out): Online RL training with generation-based rewards
- Reward Model Training: Train reward models for RLHF
TRL is built on top of Hugging Face Transformers and Accelerate, providing seamless integration with the Hugging Face ecosystem.