| name | finetuning |
| description | Model fine-tuning with PyTorch and HuggingFace Trainer. Covers dataset preparation, tokenization, training loops, TrainingArguments, SFTTrainer for instruction tuning, evaluation, and checkpoint management. Includes Unsloth recommendations. |
Model Fine-Tuning
Overview
Fine-tuning adapts a pre-trained LLM to specific tasks by training on task-specific data. This skill covers both manual PyTorch training and HuggingFace's high-level Trainer API.
Recommended: For 2x faster training with less memory, use Unsloth (see bazzite-ai-jupyter:sft).
Quick Reference
| Approach | Use Case | Speed |
|---|---|---|
| Unsloth + SFTTrainer | Recommended default | 2x faster |
| PyTorch Manual | Full control, custom training | Baseline |
| HuggingFace Trainer | Standard training, less code | Fast |
| SFTTrainer | Instruction/chat fine-tuning | Fast |
Method Comparison
| Method | Learning Rate | Use Case |
|---|---|---|
| SFT | 2e-4 | Instruction tuning (first step) |
| GRPO | 1e-5 | RL with rewards |
| DPO | 5e-6 | Preference learning |
| RLOO | 1e-5 | RL with lower variance |
| Reward | 1e-5 | Reward model training |
Unsloth Quickstart (Recommended)
# CRITICAL: Import unsloth FIRST
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported
from trl import SFTTrainer, SFTConfig
# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit",
max_seq_length=1024,
load_in_4bit=True,
)
# Apply LoRA
model = FastLanguageModel.get_peft_model(
model, r=16, lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
use_gradient_checkpointing="unsloth",
)
# Train
trainer = SFTTrainer(
model=model, tokenizer=tokenizer, train_dataset=dataset,
args=SFTConfig(
output_dir="./output",
max_steps=100,
learning_rate=2e-4,
bf16=is_bf16_supported(),
optim="adamw_8bit",
),
)
trainer.train()
See bazzite-ai-jupyter:sft for complete Unsloth patterns.
Dataset Preparation
Load from HuggingFace Hub
from datasets import load_dataset
dataset = load_dataset("timdettmers/openassistant-guanaco")
train_data = dataset["train"]
val_data = dataset["test"]
print(f"Training samples: {len(train_data)}")
print(f"Validation samples: {len(val_data)}")
Data Format
# Example conversation format
example = train_data[0]
print(example["text"])
# Output:
# ### Human: What is Python?
# ### Assistant: Python is a programming language...
Create Prompt Template
def build_prompt(instruction, response=None):
prompt = f"### Human: {instruction}\n### Assistant:"
if response:
prompt += f" {response}"
return prompt
# For training
train_prompt = build_prompt("What is AI?", "AI is artificial intelligence.")
# For inference
inference_prompt = build_prompt("What is AI?")
Tokenization
Setup Tokenizer
from transformers import AutoTokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Ensure pad token exists
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
Tokenize Dataset
def tokenize_function(examples):
return tokenizer(
examples["text"],
padding="max_length",
truncation=True,
max_length=512,
return_tensors="pt"
)
tokenized_train = train_data.map(
tokenize_function,
batched=True,
remove_columns=train_data.column_names
)
tokenized_train.set_format("torch")
PyTorch Training (Manual)
Setup Model
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16
)
Training Configuration
from dataclasses import dataclass
@dataclass
class TrainConfig:
batch_size: int = 4
learning_rate: float = 2e-5
num_epochs: int = 3
max_length: int = 512
warmup_steps: int = 100
weight_decay: float = 0.01
output_dir: str = "./checkpoints"
cfg = TrainConfig()
DataLoader
from torch.utils.data import DataLoader
train_loader = DataLoader(
tokenized_train,
batch_size=cfg.batch_size,
shuffle=True
)
Optimizer and Scheduler
from transformers import get_linear_schedule_with_warmup
optimizer = torch.optim.AdamW(
model.parameters(),
lr=cfg.learning_rate,
weight_decay=cfg.weight_decay
)
total_steps = len(train_loader) * cfg.num_epochs
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=cfg.warmup_steps,
num_training_steps=total_steps
)
Training Loop
from tqdm.auto import tqdm
model.train()
device = next(model.parameters()).device
for epoch in range(cfg.num_epochs):
total_loss = 0
progress = tqdm(train_loader, desc=f"Epoch {epoch+1}")
for batch in progress:
input_ids = batch["input_ids"].to(device)
attention_mask = batch["attention_mask"].to(device)
labels = input_ids.clone()
optimizer.zero_grad()
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
labels=labels
)
loss = outputs.loss
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
total_loss += loss.item()
progress.set_postfix({"loss": loss.item()})
avg_loss = total_loss / len(train_loader)
print(f"Epoch {epoch+1} - Average Loss: {avg_loss:.4f}")
# Save checkpoint
model.save_pretrained(f"{cfg.output_dir}/epoch_{epoch+1}")
HuggingFace Trainer
TrainingArguments
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="./checkpoints",
num_train_epochs=3,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
learning_rate=2e-5,
weight_decay=0.01,
warmup_steps=100,
logging_steps=10,
save_steps=500,
evaluation_strategy="steps",
eval_steps=500,
load_best_model_at_end=True,
fp16=True, # Mixed precision
)
Create Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_val,
tokenizer=tokenizer,
)
Train and Evaluate
# Train
train_result = trainer.train()
# Save
trainer.save_model("./final_model")
tokenizer.save_pretrained("./final_model")
# Evaluate
metrics = trainer.evaluate()
print(metrics)
SFTTrainer (Instruction Tuning)
Setup
from trl import SFTTrainer, SFTConfig
sft_config = SFTConfig(
output_dir="./sft_checkpoints",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-5,
logging_steps=10,
save_steps=500,
max_seq_length=512,
packing=False, # Don't pack multiple samples
)
Train with SFTTrainer
trainer = SFTTrainer(
model=model,
args=sft_config,
train_dataset=train_data,
tokenizer=tokenizer,
dataset_text_field="text", # Column with training text
)
trainer.train()
trainer.save_model("./sft_model")
Evaluation
Evaluation Function
def evaluate(model, dataloader):
model.eval()
total_loss = 0
with torch.no_grad():
for batch in dataloader:
input_ids = batch["input_ids"].to(device)
attention_mask = batch["attention_mask"].to(device)
labels = input_ids.clone()
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
labels=labels
)
total_loss += outputs.loss.item()
return total_loss / len(dataloader)
Perplexity
import math
eval_loss = evaluate(model, val_loader)
perplexity = math.exp(eval_loss)
print(f"Perplexity: {perplexity:.2f}")
Inference with Fine-Tuned Model
def generate_response(model, tokenizer, prompt, max_new_tokens=128):
model.eval()
device = next(model.parameters()).device
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Test
prompt = build_prompt("What is machine learning?")
response = generate_response(model, tokenizer, prompt)
print(response)
Checkpointing
Save Checkpoint
# Save model and tokenizer
model.save_pretrained("./checkpoint")
tokenizer.save_pretrained("./checkpoint")
Load Checkpoint
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("./checkpoint")
tokenizer = AutoTokenizer.from_pretrained("./checkpoint")
Resume Training
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
)
trainer.train(resume_from_checkpoint="./checkpoint")
Hyperparameters Guide
| Parameter | Typical Values | Notes |
|---|---|---|
learning_rate |
1e-5 to 5e-5 | Lower for larger models |
batch_size |
4, 8, 16 | Limited by GPU memory |
epochs |
1-5 | More for smaller datasets |
warmup_steps |
5-10% of total | Stabilizes early training |
weight_decay |
0.01-0.1 | Regularization |
max_length |
512, 1024, 2048 | Context window |
When to Use This Skill
Use when:
- Adapting LLM to specific domain/task
- Improving model performance on your data
- Creating instruction-following models
- Need full control over training process
Cross-References
bazzite-ai-jupyter:sft- Unsloth-optimized SFT (recommended)bazzite-ai-jupyter:grpo- RL with reward functionsbazzite-ai-jupyter:dpo- Preference learningbazzite-ai-jupyter:rloo- RL with lower variancebazzite-ai-jupyter:quantization- Memory-efficient trainingbazzite-ai-jupyter:peft- Parameter-efficient fine-tuningbazzite-ai-jupyter:qlora- Advanced QLoRA experimentsbazzite-ai-jupyter:inference- Fast inference patternsbazzite-ai-jupyter:transformers- Architecture understanding