| name | ollama-setup |
| description | Auto-configure Ollama when user needs local LLM deployment, free AI alternatives, or wants to eliminate OpenAI/Anthropic API costs. Trigger phrases: "install ollama", "local AI", "free LLM", "self-hosted AI", "replace OpenAI", "no API costs" |
| allowed-tools | Read, Write, Bash |
| version | 1.0.0 |
| license | MIT |
| author | Jeremy Longshore <jeremy@intentsolutions.io> |
Ollama Setup Skill
Purpose
Automatically detect when users need local AI deployment and guide them through Ollama installation. Eliminates paid API dependencies.
Activation Triggers
- "install ollama"
- "local AI models"
- "free alternative to OpenAI"
- "self-hosted LLM"
- "run AI locally"
- "eliminate API costs"
- "privacy-first AI"
- "offline AI models"
Skill Workflow
1. Detect Need for Local AI
When user mentions:
- High API costs
- Privacy concerns
- Offline requirements
- OpenAI/Anthropic alternatives
- Self-hosted infrastructure
→ Activate this skill
2. Assess System Requirements
# Check OS
uname -s
# Check available memory
free -h # Linux
vm_stat # macOS
# Check GPU
nvidia-smi # NVIDIA
system_profiler SPDisplaysDataType # macOS
3. Recommend Appropriate Models
8GB RAM:
- llama3.2:7b (4GB)
- mistral:7b (4GB)
- phi3:14b (8GB)
16GB RAM:
- codellama:13b (7GB)
- mixtral:8x7b (26GB quantized)
32GB+ RAM:
- llama3.2:70b (40GB)
- codellama:34b (20GB)
4. Installation Process
macOS:
brew install ollama
brew services start ollama
ollama pull llama3.2
Linux:
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl start ollama
ollama pull llama3.2
Docker:
docker run -d \\
-v ollama:/root/.ollama \\
-p 11434:11434 \\
--name ollama \\
ollama/ollama
docker exec -it ollama ollama pull llama3.2
5. Verify Installation
ollama list
ollama run llama3.2 "Say hello"
curl http://localhost:11434/api/tags
6. Integration Examples
Python:
import ollama
response = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])
Node.js:
const ollama = require('ollama')
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: 'Hello!' }]
})
cURL:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello!"
}'
Common Use Cases
Replace OpenAI
# Before (Paid)
from openai import OpenAI
client = OpenAI(api_key="...")
# After (Free)
import ollama
response = ollama.chat(model='llama3.2', ...)
Replace Anthropic Claude
# Before (Paid)
from anthropic import Anthropic
client = Anthropic(api_key="...")
# After (Free)
import ollama
response = ollama.chat(model='mistral', ...)
Code Generation
ollama pull codellama
ollama run codellama "Write a Python REST API"
Cost Comparison
OpenAI GPT-4:
- Input: $0.03/1K tokens
- Output: $0.06/1K tokens
- 1M tokens/month = $30-60
Ollama:
- Setup: Free
- Usage: $0 (hardware you already own)
- Savings: $30-60/month ✓
Performance Expectations
With GPU (NVIDIA/Metal):
- 7B models: 50-100 tokens/sec
- 13B models: 30-60 tokens/sec
- 34B models: 20-40 tokens/sec
CPU Only:
- 7B models: 10-20 tokens/sec
- 13B models: 5-10 tokens/sec
- 34B models: 2-5 tokens/sec
Troubleshooting
Out of Memory
# Use quantized models
ollama pull llama3.2:7b-q4 # 4-bit (smaller)
Slow Performance
# Use smaller model
ollama pull mistral:7b # Faster than 70B
Model Not Found
# Pull model first
ollama pull llama3.2
ollama list # Verify
Success Metrics
After skill execution, user should have:
- ✅ Ollama installed and running
- ✅ At least one model downloaded
- ✅ Successful test inference
- ✅ Integration code examples
- ✅ Zero ongoing API costs
Privacy Benefits
- Data never leaves local machine
- No API keys required
- No usage tracking
- GDPR/HIPAA compliant (local only)
- Offline capable
When NOT to Use This Skill
- User needs latest GPT-4 specifically
- User has <8GB RAM
- User needs real-time updates (like web search)
- User wants Claude Code itself (requires Anthropic API)
Related Skills
local-llm-wrapper- Generic local LLM integrationai-sdk-agents- AI SDK with Ollama supportprivacy-first-ai- Privacy-focused AI workflows