name: unsloth-models description: Guidance on selecting and configuring supported model architectures like Llama 4, DeepSeek-R1, and Qwen3. Triggers: llama 4, deepseek-r1, qwen3, gemma 3, model selection, instruct vs base.

Overview

Unsloth supports a wide range of state-of-the-art model architectures, providing pre-quantized Hub variants and optimized kernels for models like Llama 4, DeepSeek-R1, and Qwen3. Selecting the right variant (Instruct vs Base) is critical for training success.

When to Use

When starting a new fine-tuning project and deciding on a base architecture.
When utilizing reasoning-heavy models (DeepSeek-R1) on consumer hardware.
When performing continued pre-training on domain-specific data.

Decision Tree

Is the task conversational or instruction-following?
- Yes: Use 'Instruct' variants.
Is the task raw knowledge injection or domain pre-training?
- Yes: Use 'Base' variants.
Is reasoning/logic a priority?
- Yes: Select DeepSeek-R1 Distills or similar architectures.

Workflows

Selecting the Right Model

Use 'Instruct' models for conversational tasks or when data is limited.
Use 'Base' models for domain-specific knowledge injection or raw text pre-training.
Select 'unsloth-bnb-4bit' variants to leverage pre-calculated quantization statistics.

Fine-tuning DeepSeek-R1 Distills

Load the specific distilled variant (e.g., Llama-8B).
Use datasets including reasoning paths (Chain of Thought) to preserve logic capabilities.
Apply optimized Llama 3.1 kernels during the SFT or DPO pipeline.

Non-Obvious Insights

Unsloth releases specialized 'distilled' versions of heavy reasoning models (like DeepSeek-R1) that are specifically optimized to fit on consumer hardware while retaining logic performance.
Choosing the 'unsloth-bnb-4bit' variants on the Hub is not just about speed; these variants include critical tokenizer fixes and pre-calculated quantization statistics that ensure better training stability.
New reasoning models can be trained from scratch or fine-tuned using GRPO (Group Relative Policy Optimization) natively within the Unsloth framework, bypassing the need for complex PPO setups.

Evidence

"Llama 4 by Meta, including Scout & Maverick are now supported." Source
"Instruct versions are used for inference or fine-tuning, while Base models are usually used for continued pre-training." Source

Scripts

scripts/unsloth-models_tool.py: Script to list and download recommended Unsloth-optimized models.
scripts/unsloth-models_tool.js: Comparison helper for model parameter sizes.

Dependencies

unsloth
huggingface_hub

References

[[references/README.md]]

unsloth-models

Install Skill

SKILL.md