qlora
Installation
SKILL.md
QLoRA: Quantized Low-Rank Adaptation
QLoRA enables fine-tuning of large language models on consumer GPUs by combining 4-bit quantization with LoRA adapters. A 65B model can be fine-tuned on a single 48GB GPU while matching 16-bit fine-tuning performance.
Prerequisites: This skill assumes familiarity with LoRA. See the
loraskill for LoRA fundamentals (LoraConfig, target_modules, training patterns).
Table of Contents
- Core Innovations
- BitsAndBytesConfig Deep Dive
- Memory Requirements
- Complete Training Example
- Inference and Merging
- Troubleshooting
- Best Practices
- References