lora
Using LoRA for Fine-tuning
LoRA (Low-Rank Adaptation) enables efficient fine-tuning by freezing pretrained weights and injecting small trainable matrices into transformer layers. This reduces trainable parameters to ~0.1% of the original model while maintaining performance.
Table of Contents
- Core Concepts
- Basic Setup
- Configuration Parameters
- QLoRA (Quantized LoRA)
- Training Patterns
- Saving and Loading
- Merging Adapters
- Best Practices
- References
Core Concepts
How LoRA Works
More from itsmostafa/llm-engineering-skills
mlx
Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.
38prompt-engineering
Crafting effective prompts for LLMs. Use when designing prompts, improving output quality, structuring complex instructions, or debugging poor model responses.
19context-engineering
Strategies for managing LLM context windows effectively in AI agents. Use when building agents that handle long conversations, multi-step tasks, tool orchestration, or need to maintain coherence across extended interactions.
17pytorch
Building and training neural networks with PyTorch. Use when implementing deep learning models, training loops, data pipelines, model optimization with torch.compile, distributed training, or deploying PyTorch models.
17rlhf
Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.
16qlora
Memory-efficient fine-tuning with 4-bit quantization and LoRA adapters. Use when fine-tuning large models (7B+) on consumer GPUs, when VRAM is limited, or when standard LoRA still exceeds memory. Builds on the lora skill.
14