| name | training-hub |
| description | Fine-tune LLMs using Red Hat training-hub library with SFT, LoRA, and OSFT algorithms. Use when preparing JSONL datasets, running training jobs, configuring hardware, scaling to clusters, evaluating models, or deploying with vLLM. |
Training Hub
Red Hat's unified library for LLM post-training: SFT, LoRA, and OSFT (continual learning).
Quick Reference
| Task | Command |
|---|---|
| Recommend config | python scripts/recommend_config.py --model <model> --hardware <hw> |
| Estimate memory | python scripts/estimate_memory.py --model <model> --method sft --hardware h100 |
| Validate dataset | python scripts/validate_dataset.py data.jsonl |
| Full fine-tuning | from training_hub import sft |
| LoRA training | from training_hub import lora_sft |
| OSFT (continual) | from training_hub import osft |
Installation
pip install training-hub # Basic
pip install training-hub[lora] # LoRA with Unsloth (2x faster)
pip install training-hub[cuda] --no-build-isolation # CUDA support
Get Started Fast
# Get optimal config for your hardware
python scripts/recommend_config.py \
--model meta-llama/Llama-3.1-8B-Instruct \
--hardware rtx-5090
Data Format
Training data must be JSONL with message structure:
{"messages": [{"role": "user", "content": "Q"}, {"role": "assistant", "content": "A"}]}
Validate before training:
python scripts/validate_dataset.py ./training_data.jsonl
For data preparation details, see DATA-FORMATS.md.
Training Methods
Supervised Fine-Tuning (SFT)
Full-parameter fine-tuning. Requires significant VRAM.
from training_hub import sft
result = sft(
model_path="Qwen/Qwen2.5-7B-Instruct",
data_path="./training_data.jsonl",
ckpt_output_dir="./checkpoints",
num_epochs=3,
effective_batch_size=8,
learning_rate=2e-5,
max_seq_len=2048,
max_tokens_per_gpu=45000,
)
LoRA Fine-Tuning
Memory-efficient adaptation (up to 2x faster, 70% less VRAM):
from training_hub import lora_sft
result = lora_sft(
model_path="Qwen/Qwen2.5-7B-Instruct",
data_path="./training_data.jsonl",
ckpt_output_dir="./outputs",
lora_r=16,
lora_alpha=32,
num_epochs=3,
learning_rate=2e-4,
)
QLoRA (4-bit): Add load_in_4bit=True for large models on limited VRAM.
OSFT (Continual Learning)
Adapt without catastrophic forgetting:
from training_hub import osft
result = osft(
model_path="meta-llama/Llama-3.1-8B-Instruct",
data_path="./domain_data.jsonl",
ckpt_output_dir="./checkpoints",
unfreeze_rank_ratio=0.25,
effective_batch_size=16,
learning_rate=2e-5,
)
For all parameters, see ALGORITHMS.md.
Hardware Support
| Hardware | VRAM | Best For |
|---|---|---|
| RTX 5090 | 32GB | 8B LoRA, 70B QLoRA |
| DGX Spark | 128GB | 70B SFT |
| H100 | 80GB | 14B SFT, 70B LoRA |
| 8×H100 | 640GB | 70B SFT |
# Check if your config fits
python scripts/estimate_memory.py \
--model meta-llama/Llama-3.1-70B-Instruct \
--method lora \
--hardware h100 \
--num-gpus 8
For hardware-specific configs, see HARDWARE.md.
Scaling
Multi-GPU:
result = sft(..., nproc_per_node=8)
Multi-node:
result = sft(..., nnodes=2, node_rank=0, nproc_per_node=8, rdzv_endpoint="0.0.0.0:29500")
For Slurm, Kubernetes, and datacenter deployments, see SCALE.md.
Algorithm Selection
| Scenario | Method |
|---|---|
| First-time fine-tuning, large dataset | SFT |
| Memory constrained | LoRA |
| Very large model (70B+), limited VRAM | LoRA + QLoRA |
| Preserve existing capabilities | OSFT |
| Domain adaptation, small dataset | OSFT |
Documentation
| Topic | File |
|---|---|
| Hardware profiles & configs | HARDWARE.md |
| All algorithm parameters | ALGORITHMS.md |
| Data formats & conversion | DATA-FORMATS.md |
| Datacenter & cluster setup | SCALE.md |
| Model evaluation | EVALUATION.md |
| vLLM inference & serving | INFERENCE.md |
| Advanced techniques | ADVANCED.md |
| Model-specific configs | MODELS.md |
| Troubleshooting | TROUBLESHOOTING.md |
| Distributed training | DISTRIBUTED.md |
Utility Scripts
| Script | Purpose |
|---|---|
recommend_config.py |
Generate optimal config for model + hardware |
estimate_memory.py |
Estimate GPU memory requirements |
validate_dataset.py |
Validate JSONL dataset format |
convert_to_jsonl.py |
Convert CSV, Alpaca, ShareGPT to JSONL |
Troubleshooting
CUDA OOM: Reduce max_tokens_per_gpu, use LoRA + QLoRA, or add GPUs
Dataset errors: Run python scripts/validate_dataset.py first
LoRA multi-GPU: Requires torchrun --nproc-per-node=N script.py
Training diverges: Lower learning_rate (try 1e-5 for SFT, 1e-4 for LoRA)
For more, see TROUBLESHOOTING.md.