hugging-face-model-trainer
TRL Training on Hugging Face Jobs
Overview
Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
TRL provides multiple training methods:
- SFT (Supervised Fine-Tuning) - Standard instruction tuning
- DPO (Direct Preference Optimization) - Alignment from preference data
- GRPO (Group Relative Policy Optimization) - Online RL training
- Reward Modeling - Train reward models for RLHF
For detailed TRL method documentation:
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
# etc.
More from sickn33/antigravity-awesome-skills
docker-expert
You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.
15.0Knodejs-best-practices
Node.js development principles and decision-making. Framework selection, async patterns, security, and architecture. Teaches thinking, not copying.
11.2Ktypescript-expert
TypeScript and JavaScript expert with deep knowledge of type-level programming, performance optimization, monorepo management, migration strategies, and modern tooling.
8.3Kapi-security-best-practices
Implement secure API design patterns including authentication, authorization, input validation, rate limiting, and protection against common API vulnerabilities
7.0Kclean-code
This skill embodies the principles of \"Clean Code\" by Robert C. Martin (Uncle Bob). Use it to transform \"code that works\" into \"code that is clean.\"
6.5Knextjs-best-practices
Next.js App Router principles. Server Components, data fetching, routing patterns.
5.1K