name	peft
description	Parameter-efficient fine-tuning with LoRA and Unsloth. Covers LoraConfig, target module selection, QLoRA for 4-bit training, adapter merging, and Unsloth optimizations for 2x faster training.

Parameter-Efficient Fine-Tuning (PEFT)

Overview

PEFT methods like LoRA train only a small number of adapter parameters instead of the full model, reducing memory by 10-100x while maintaining quality.

Quick Reference

Method	Memory	Speed	Quality
Full Fine-tune	High	Slow	Best
LoRA	Low	Fast	Very Good
QLoRA	Very Low	Fast	Good
Unsloth	Very Low	2x Faster	Good

LoRA Concepts

How LoRA Works

Original weight matrix W (frozen):     d x k
LoRA adapters A and B:                 d x r, r x k (where r << min(d,k))

Forward pass:
  output = x @ W + x @ A @ B * (alpha / r)

Trainable params: 2 * r * d  (instead of d * k)

Memory Savings

def lora_savings(d, k, r):
    original = d * k
    lora = 2 * r * max(d, k)
    reduction = (1 - lora / original) * 100
    return reduction

# Example: 4096 x 4096 matrix with rank 8
print(f"Memory reduction: {lora_savings(4096, 4096, 8):.1f}%")
# Output: ~99.6% reduction

Basic LoRA Setup

Configure LoRA

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=8,                          # Rank (capacity)
    lora_alpha=16,                # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Which layers
    lora_dropout=0.05,            # Regularization
    bias="none",                  # Don't train biases
    task_type=TaskType.CAUSAL_LM  # Task type
)

Apply to Model

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device_map="auto"
)

model = get_peft_model(model, lora_config)

# Check trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 || all params: 1,100,048,384 || trainable%: 0.38%

LoRA Parameters

Key Parameters

Parameter	Values	Effect
`r`	4, 8, 16, 32	Adapter capacity
`lora_alpha`	r to 2*r	Scaling (higher = stronger)
`target_modules`	List	Which layers to adapt
`lora_dropout`	0.0-0.1	Regularization

Target Modules

# Common target modules for different models

# LLaMA / Mistral / TinyLlama
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"]

# GPT-2
target_modules = ["c_attn", "c_proj"]

# BLOOM
target_modules = ["query_key_value", "dense"]

# All linear layers (most aggressive)
target_modules = "all-linear"

Rank Selection Guide

Rank (r)	Use Case
4	Simple tasks, small datasets
8	General purpose (recommended)
16	Complex tasks, more capacity
32+	Near full fine-tune quality

QLoRA (Quantized LoRA)

Setup

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

# 4-bit quantization config
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    quantization_config=quantization_config,
    device_map="auto"
)

# Prepare for k-bit training (important!)
model = prepare_model_for_kbit_training(model)

# Add LoRA adapters
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Training with PEFT

Using SFTTrainer

from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

dataset = load_dataset("timdettmers/openassistant-guanaco")

sft_config = SFTConfig(
    output_dir="./lora_checkpoints",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    learning_rate=2e-4,  # Higher LR for LoRA
    logging_steps=10,
    save_steps=500,
    max_seq_length=512,
    gradient_accumulation_steps=4,
)

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    dataset_text_field="text",
    peft_config=lora_config,  # Pass LoRA config
)

trainer.train()

Unsloth (2x Faster Training)

Setup

from unsloth import FastLanguageModel

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/tinyllama-chat-bnb-4bit",  # Pre-quantized
    max_seq_length=2048,
    dtype=None,  # Auto-detect
    load_in_4bit=True,
)

# Add LoRA with Unsloth
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing=True,
    random_state=42,
)

Train with Unsloth

from trl import SFTTrainer, SFTConfig

sft_config = SFTConfig(
    output_dir="./unsloth_output",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=100,
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=1,
    optim="adamw_8bit",  # Memory-efficient optimizer
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=42,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=sft_config,
)

trainer.train()

Save and Load Adapters

Save Adapters Only

# Save just the LoRA weights (small!)
model.save_pretrained("./lora_adapters")

Load Adapters

from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, "./lora_adapters")

Merge Adapters into Base Model

# Merge LoRA weights into base model (for deployment)
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./merged_model")

Inference with Adapters

from peft import PeftModel

# Load base + adapters
base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
model = PeftModel.from_pretrained(base_model, "./lora_adapters")

# Generate
model.eval()
inputs = tokenizer("What is Python?", return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0]))

Comparison: Full vs LoRA vs QLoRA

Aspect	Full Fine-tune	LoRA	QLoRA
Trainable %	100%	~0.1-1%	~0.1-1%
Memory	4x model	~1.2x model	~0.5x model
Training speed	Slow	Fast	Fast
Quality	Best	Very Good	Good
7B model	28GB+	~16GB	~6GB

Troubleshooting

Out of Memory

Fix:

# Use gradient checkpointing
model.gradient_checkpointing_enable()

# Use smaller batch with accumulation
per_device_train_batch_size=1
gradient_accumulation_steps=8

Poor Quality

Fix:

Increase r (rank)
Add more target modules
Train longer
Check data quality

NaN Loss

Fix:

Lower learning rate
Use gradient clipping
Check for data issues

When to Use This Skill

Use when:

GPU memory is limited
Fine-tuning large models (7B+)
Need fast training iterations
Want to swap adapters for different tasks

Cross-References

bazzite-ai-jupyter:finetuning - Full fine-tuning basics
bazzite-ai-jupyter:quantization - Quantization for QLoRA
bazzite-ai-jupyter:transformers - Target module selection

peft

Install Skill

SKILL.md

Parameter-Efficient Fine-Tuning (PEFT)

Overview

Quick Reference

LoRA Concepts

How LoRA Works

Memory Savings

Basic LoRA Setup

Configure LoRA

Apply to Model

LoRA Parameters

Key Parameters

Target Modules

Rank Selection Guide

QLoRA (Quantized LoRA)

Setup

Training with PEFT

Using SFTTrainer

Unsloth (2x Faster Training)

Setup

Train with Unsloth

Save and Load Adapters

Save Adapters Only

Load Adapters

Merge Adapters into Base Model

Inference with Adapters

Comparison: Full vs LoRA vs QLoRA

Troubleshooting

Out of Memory

Poor Quality

NaN Loss

When to Use This Skill

Cross-References