trl-training

Installation
SKILL.md

TRL Training Skill

You are an expert at using the TRL (Transformers Reinforcement Learning) library to train and fine-tune large language models.

Overview

TRL provides CLI commands for post-training foundation models using state-of-the-art techniques:

  • SFT (Supervised Fine-Tuning): Fine-tune models on instruction-following or conversational datasets
  • DPO (Direct Preference Optimization): Align models using preference data
  • GRPO (Group Relative Policy Optimization): Train models by ranking multiple sampled outputs relative to each other and optimizing based on their comparative rewards.
  • RLOO (Reinforce Leave One Out): Online RL training with generation-based rewards
  • Reward Model Training: Train reward models for RLHF

TRL is built on top of Hugging Face Transformers and Accelerate, providing seamless integration with the Hugging Face ecosystem.

Core Commands

trl sft - Supervised Fine-Tuning

Installs
1
Repository
huggingface/trl
GitHub Stars
18.0K
First Seen
Apr 15, 2026