| name | slipstream-finetune |
| description | Finetune LLMs to speak Slipstream natively - complete guide with GLM-4-9B |
Slipstream Finetuning Guide
Train LLMs to communicate using the Slipstream protocol natively. This guide covers dataset generation, model finetuning, and releasing on HuggingFace.
Recommended Model: GLM-4-9B-0414
Why GLM-4-9B-0414?
- MIT licensed (can release finetuned weights commercially)
- 9B parameters - good balance of capability and trainability
- Specifically optimized for function calling and agentic tasks
- Excellent instruction following
Quick Start
1. Generate High-Quality Dataset
Option A: Template-based (fast, free)
python -m slipcore.finetune -n 1000 -f sharegpt -o slipstream_train.jsonl
Option B: LLM-enhanced (higher quality, requires API)
# Using Claude API (recommended for quality)
export ANTHROPIC_API_KEY="your-key"
python -m slipcore.finetune_llm -n 1000 --provider anthropic -o slipstream_train.jsonl
# Using OpenAI (good quality, widely available)
export OPENAI_API_KEY="your-key"
python -m slipcore.finetune_llm -n 1000 --provider openai --model gpt-4o-mini -o slipstream_train.jsonl
# Using Together.ai (cheaper, good for large datasets)
export TOGETHER_API_KEY="your-key"
python -m slipcore.finetune_llm -n 2000 --provider together --model meta-llama/Llama-3.3-70B-Instruct-Turbo -o slipstream_train.jsonl
# Using DeepSeek (very cheap, good quality)
export DEEPSEEK_API_KEY="your-key"
python -m slipcore.finetune_llm -n 2000 --provider deepseek -o slipstream_train.jsonl
Cost Estimates for 1000 examples:
| Provider | Model | ~Cost |
|---|---|---|
| Anthropic | claude-sonnet-4-20250514 | ~$0.50 |
| OpenAI | gpt-4o-mini | ~$0.15 |
| Together | Llama-3.3-70B | ~$0.10 |
| DeepSeek | deepseek-chat | ~$0.02 |
2. Finetune GLM-4-9B with Unsloth
from unsloth import FastLanguageModel
import torch
# Load GLM-4-9B-0414
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="THUDM/GLM-4-9B-0414",
max_seq_length=2048,
dtype=None, # Auto-detect
load_in_4bit=True, # QLoRA - fits in ~8GB VRAM
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=42,
)
# Load dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="slipstream_train.jsonl", split="train")
# GLM-4 chat template
def format_glm4(example):
convs = example["conversations"]
text = ""
for conv in convs:
if conv["from"] == "system":
text += f"[gMASK]<sop><|system|>\n{conv['value']}"
elif conv["from"] == "human":
text += f"<|user|>\n{conv['value']}"
elif conv["from"] == "gpt":
text += f"<|assistant|>\n{conv['value']}"
return {"text": text}
dataset = dataset.map(format_glm4)
# Train
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=200, # ~1000 examples, 2 epochs
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=10,
output_dir="slipstream_glm4",
optim="adamw_8bit",
seed=42,
),
)
trainer.train()
# Save LoRA adapter
model.save_pretrained("slipstream_glm4_lora")
tokenizer.save_pretrained("slipstream_glm4_lora")
3. Test the Finetuned Model
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="slipstream_glm4_lora",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
# Test
prompt = """[gMASK]<sop><|system|>
You communicate using the Slipstream protocol.<|user|>
Tell the backend team to review the authentication code<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0]))
# Expected: SLIP v1 agent backend RequestReview auth_code
4. Export and Release
Option A: LoRA adapter only (~200MB)
model.push_to_hub("your-username/slipstream-glm4-9b-lora")
tokenizer.push_to_hub("your-username/slipstream-glm4-9b-lora")
Option B: Merged full model (~18GB)
# Merge LoRA into base model
merged_model = model.merge_and_unload()
merged_model.push_to_hub("your-username/slipstream-glm4-9b")
tokenizer.push_to_hub("your-username/slipstream-glm4-9b")
Option C: GGUF for Ollama/llama.cpp (~5-9GB)
# Save as GGUF quantized
model.save_pretrained_gguf(
"slipstream_glm4_gguf",
tokenizer,
quantization_method="q4_k_m", # Good balance
)
# Or push directly to HuggingFace
model.push_to_hub_gguf(
"your-username/slipstream-glm4-9b-gguf",
tokenizer,
quantization_method=["q4_k_m", "q8_0"], # Multiple quants
)
5. Release Dataset
HuggingFace Datasets:
from datasets import Dataset
import json
# Load your generated data
with open("slipstream_train.jsonl") as f:
data = [json.loads(line) for line in f]
dataset = Dataset.from_list(data)
dataset.push_to_hub("your-username/slipstream-training-data")
Kaggle:
kaggle datasets create -p ./data -u
Zenodo (for academic citation): Upload via https://zenodo.org/deposit/new
Alternative Models
| Model | Size | License | Notes |
|---|---|---|---|
| GLM-4-9B-0414 | 9B | MIT | Best for agentic, function calling |
| Qwen2.5-7B-Instruct | 7B | Apache 2.0 | Strong general purpose |
| Llama-3.1-8B-Instruct | 8B | Llama 3.1 | Most popular, good baseline |
| Mistral-7B-Instruct-v0.3 | 7B | Apache 2.0 | Fast, efficient |
| Phi-3-medium | 14B | MIT | Larger but very capable |
Training Tips
- Dataset size: 500-2000 examples is usually sufficient
- Quality > Quantity: LLM-generated data beats templates
- Epochs: 1-2 epochs, watch for overfitting
- Learning rate: 2e-4 for small models, 1e-4 for larger
- Validation: Hold out 10% for testing generalization
Using with Ollama
After creating GGUF:
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./slipstream_glm4_gguf/slipstream-glm4-9b-Q4_K_M.gguf
SYSTEM "You communicate using the Slipstream protocol (SLIP). Always respond with SLIP wire format: SLIP v1 <src> <dst> <anchor> [payload...]"
TEMPLATE """[gMASK]<sop><|system|>
{{ .System }}<|user|>
{{ .Prompt }}<|assistant|>
{{ .Response }}"""
EOF
# Create and run
ollama create slipstream -f Modelfile
ollama run slipstream "Tell alice to review the API code"
# -> SLIP v1 agent alice RequestReview api_code
Cost Summary
| Component | Free Option | Paid Option |
|---|---|---|
| Dataset | Template generator | Claude/OpenAI API (~$0.50) |
| Training | Google Colab free | Colab Pro ($10/mo) or local GPU |
| Hosting | HuggingFace free | - |
| Inference | Ollama local | Together/Fireworks API |
Total cost to release a finetuned Slipstream model: $0 - $15