unsloth
unsloth
Overview
Unsloth (https://unsloth.ai) is an open-source library that fine-tunes open LLMs ~2x faster with ~70% less VRAM than vanilla Hugging Face + Flash Attention 2, using hand-written Triton kernels for the QLoRA/LoRA forward and backward passes. It wraps around transformers, peft, and TRL's trainers (SFTTrainer, GRPOTrainer, DPOTrainer, KTOTrainer, ORPOTrainer) rather than replacing them — so existing HF code ports over easily.
Use this skill when:
- Fine-tuning any LLM on a single GPU (or two) and you care about speed or VRAM.
- Fine-tuning an embedding or reranker model (BGE-M3, EmbeddingGemma, Qwen3-Embedding, MiniLM, MPNet, ModernBERT).
- Doing RL on reasoning models (GRPO) or preference optimization (DPO/ORPO/KTO).
- Exporting a fine-tuned model to GGUF for Ollama/llama.cpp or to vLLM/SGLang for serving.
Do not reach for Unsloth when the workload demands first-class multi-GPU training; Unsloth works with accelerate/DeepSpeed but multi-GPU is still maturing. Axolotl or torchtune are better defaults there.
The three load-bearing rules
These three show up in every Unsloth script and failing to follow them produces silent, hard-to-debug mistakes:
- Import
unslothfirst. Beforetransformers,peft,trl, orsentence_transformers. Unsloth monkey-patches those libraries on import; the patches only apply to modules not yet imported.
More from maragudk/fabrik
diary
Write and maintain an implementation diary capturing what changed, why, what worked, what failed (with exact errors and commands), what was tricky, and how to review and validate. Activates proactively during non-trivial implementation work (new features, bug fixes, refactors, research spikes) and at natural session-end moments -- after a PR merges, a feature ships, or a work chunk wraps up -- to capture the narrative while it's still fresh. Does not activate for trivial tasks like one-line fixes, config tweaks, or quick questions.
3decisions
Guide for recording significant architectural and design decisions in docs/decisions.md. Use this skill when clearly significant architectural decisions are made (database choices, frameworks, core design patterns) or when explicitly asked to document a decision. Also suggest proactively at natural session-end moments -- after a PR merges, a feature ships, or a work chunk wraps up -- if a significant decision was made during the session and not yet recorded. Be conservative - only suggest for major decisions, not minor implementation details.
3garden
Autonomous project gardening. Scans for maintenance issues (starting with documentation), picks one, fixes it in a worktree, self-reviews with competing agents, and opens a PR. Use when the user wants to tidy up the project, fix stale docs, or generally tend the codebase. Invoke with /garden.
2modal
Guide for running Python code on Modal, the serverless compute platform for AI workloads, batch jobs, scheduled tasks, web endpoints, and sandboxed code execution. Use this skill whenever the user is writing or modifying Modal code (anything importing `modal`, decorating with `@app.function`, `@app.cls`, `@modal.fastapi_endpoint`, etc.), running `modal run`/`modal deploy`/`modal serve`, configuring GPUs/images/volumes/secrets for Modal, or asking how to host inference, fine-tuning, or agent sandboxes on Modal.
2sql
Guide for working with SQL queries, in particular for SQLite. Use this skill when writing SQL queries, analyzing database schemas, designing migrations, or working with SQLite-related code.
2git
Guide for using git with specific preferences -- branch names without `feat/`/`hotfix/` prefixes, backticks around code identifiers in commit messages, asking about GitHub issues to reference before committing. Use this whenever you branch, commit, or write a commit message -- not just when explicitly asked to "commit". These conventions aren't in your default knowledge and you'll get them wrong without consulting this skill.
2