| name | unsloth-lora |
| description | Configuring and optimizing 16-bit Low-Rank Adaptation (LoRA) and Rank-Stabilized LoRA (rsLoRA) for efficient LLM fine-tuning using triggers like lora, qlora, rslora, rank selection, lora_alpha, lora_dropout, and target_modules. |
Overview
Unsloth optimizes Low-Rank Adaptation (LoRA) by providing 16-bit trainable matrices that allow for efficient fine-tuning without updating all model weights. It supports standard LoRA and Rank-Stabilized LoRA (rsLoRA), utilizing specialized kernels to accelerate training and reduce memory overhead.
When to Use
- When fine-tuning large language models on consumer-grade or limited GPU hardware.
- When aiming to match full fine-tuning performance with significantly lower VRAM usage.
- When specialized scaling (rsLoRA) is required for higher rank stability.
Decision Tree
- Need to update all weights?
- Yes: Use [[unsloth-fft]].
- No: Proceed to LoRA.
- Using high rank (r > 64)?
- Yes: Enable
use_rslora = Truefor sqrt(r) scaling. - No: Use standard LoRA.
- Yes: Enable
- Maximizing speed?
- Yes: Set
lora_dropout = 0to enable internal kernel optimizations.
- Yes: Set
Workflows
Optimizing LoRA Architecture
- Target all 7 major linear layers (q, k, v, o, gate, up, down) to match full fine-tuning performance.
- Initialize rank (r) between 16 and 32 for general tasks, or up to 128 for complex domain adaptation.
- Set lora_alpha equal to r or 2*r to maintain aggressive learning while ensuring numerical stability.
Configuring Rank-Stabilized LoRA (rsLoRA)
- Set
use_rslora = Trueinget_peft_modelto enable sqrt(r) scaling. - Increase rank (r) without the typical instability risks associated with high-alpha standard LoRA.
- Monitor training loss to ensure the model captures underlying patterns without memorization.
Non-Obvious Insights
- Setting
lora_dropoutto 0 is not just a parameter choice; it explicitly triggers internal Unsloth kernel-level optimizations that significantly speed up the training loop. - Unsloth includes a custom gradient accumulation fix that ensures results are mathematically identical regardless of the batch size and accumulation step combination.
- For verifying weight updates, MD5 checksums or absolute difference sums are more reliable than
np.allclose()because LoRA induces subtle Gaussian-distributed changes.
Evidence
- "LoRA: Fine-tunes small, trainable matrices in 16-bit without updating all model weights." Source
- "For optimal performance, LoRA should be applied to all major linear layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj." Source
- "Set use_rslora = True... the effective scaling becomes lora_alpha / sqrt(r) instead of the standard lora_alpha / r." Source
Scripts
scripts/unsloth-lora_tool.py: Python utility for configuring LoRA parameters in the Unsloth framework.scripts/unsloth-lora_tool.js: JavaScript helper for generating LoRA configuration objects.
Dependencies
- unsloth
- torch
- peft
- bitsandbytes
References
- [[references/README.md]]