unsloth

Overview

Unsloth (https://unsloth.ai) is an open-source library that fine-tunes open LLMs ~2x faster with ~70% less VRAM than vanilla Hugging Face + Flash Attention 2, using hand-written Triton kernels for the QLoRA/LoRA forward and backward passes. It wraps around transformers, peft, and TRL's trainers (SFTTrainer, GRPOTrainer, DPOTrainer, KTOTrainer, ORPOTrainer) rather than replacing them — so existing HF code ports over easily.

Use this skill when:

Fine-tuning any LLM on a single GPU (or two) and you care about speed or VRAM.
Fine-tuning an embedding or reranker model (BGE-M3, EmbeddingGemma, Qwen3-Embedding, MiniLM, MPNet, ModernBERT).
Doing RL on reasoning models (GRPO) or preference optimization (DPO/ORPO/KTO).
Exporting a fine-tuned model to GGUF for Ollama/llama.cpp or to vLLM/SGLang for serving.

Do not reach for Unsloth when the workload demands first-class multi-GPU training; Unsloth works with accelerate/DeepSpeed but multi-GPU is still maturing. Axolotl or torchtune are better defaults there.

The three load-bearing rules

These three show up in every Unsloth script and failing to follow them produces silent, hard-to-debug mistakes:

Import unsloth first. Before transformers, peft, trl, or sentence_transformers. Unsloth monkey-patches those libraries on import; the patches only apply to modules not yet imported.

unsloth

unsloth

Overview

The three load-bearing rules

More from maragudk/fabrik

diary

decisions

garden

modal

sql

git