Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ollama-setup
description Auto-configure Ollama when user needs local LLM deployment, free AI alternatives, or wants to eliminate OpenAI/Anthropic API costs. Trigger phrases: "install ollama", "local AI", "free LLM", "self-hosted AI", "replace OpenAI", "no API costs"
allowed-tools Read, Write, Bash
version 1.0.0
license MIT
author Jeremy Longshore <jeremy@intentsolutions.io>

Ollama Setup Skill

Purpose

Automatically detect when users need local AI deployment and guide them through Ollama installation. Eliminates paid API dependencies.

Activation Triggers

  • "install ollama"
  • "local AI models"
  • "free alternative to OpenAI"
  • "self-hosted LLM"
  • "run AI locally"
  • "eliminate API costs"
  • "privacy-first AI"
  • "offline AI models"

Skill Workflow

1. Detect Need for Local AI

When user mentions:

  • High API costs
  • Privacy concerns
  • Offline requirements
  • OpenAI/Anthropic alternatives
  • Self-hosted infrastructure

→ Activate this skill

2. Assess System Requirements

# Check OS
uname -s

# Check available memory
free -h  # Linux
vm_stat  # macOS

# Check GPU
nvidia-smi  # NVIDIA
system_profiler SPDisplaysDataType  # macOS

3. Recommend Appropriate Models

8GB RAM:

  • llama3.2:7b (4GB)
  • mistral:7b (4GB)
  • phi3:14b (8GB)

16GB RAM:

  • codellama:13b (7GB)
  • mixtral:8x7b (26GB quantized)

32GB+ RAM:

  • llama3.2:70b (40GB)
  • codellama:34b (20GB)

4. Installation Process

macOS:

brew install ollama
brew services start ollama
ollama pull llama3.2

Linux:

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl start ollama
ollama pull llama3.2

Docker:

docker run -d \\
  -v ollama:/root/.ollama \\
  -p 11434:11434 \\
  --name ollama \\
  ollama/ollama

docker exec -it ollama ollama pull llama3.2

5. Verify Installation

ollama list
ollama run llama3.2 "Say hello"
curl http://localhost:11434/api/tags

6. Integration Examples

Python:

import ollama

response = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])

Node.js:

const ollama = require('ollama')

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Hello!' }]
})

cURL:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello!"
}'

Common Use Cases

Replace OpenAI

# Before (Paid)
from openai import OpenAI
client = OpenAI(api_key="...")

# After (Free)
import ollama
response = ollama.chat(model='llama3.2', ...)

Replace Anthropic Claude

# Before (Paid)
from anthropic import Anthropic
client = Anthropic(api_key="...")

# After (Free)
import ollama
response = ollama.chat(model='mistral', ...)

Code Generation

ollama pull codellama
ollama run codellama "Write a Python REST API"

Cost Comparison

OpenAI GPT-4:

  • Input: $0.03/1K tokens
  • Output: $0.06/1K tokens
  • 1M tokens/month = $30-60

Ollama:

  • Setup: Free
  • Usage: $0 (hardware you already own)
  • Savings: $30-60/month ✓

Performance Expectations

With GPU (NVIDIA/Metal):

  • 7B models: 50-100 tokens/sec
  • 13B models: 30-60 tokens/sec
  • 34B models: 20-40 tokens/sec

CPU Only:

  • 7B models: 10-20 tokens/sec
  • 13B models: 5-10 tokens/sec
  • 34B models: 2-5 tokens/sec

Troubleshooting

Out of Memory

# Use quantized models
ollama pull llama3.2:7b-q4  # 4-bit (smaller)

Slow Performance

# Use smaller model
ollama pull mistral:7b  # Faster than 70B

Model Not Found

# Pull model first
ollama pull llama3.2
ollama list  # Verify

Success Metrics

After skill execution, user should have:

  • ✅ Ollama installed and running
  • ✅ At least one model downloaded
  • ✅ Successful test inference
  • ✅ Integration code examples
  • ✅ Zero ongoing API costs

Privacy Benefits

  • Data never leaves local machine
  • No API keys required
  • No usage tracking
  • GDPR/HIPAA compliant (local only)
  • Offline capable

When NOT to Use This Skill

  • User needs latest GPT-4 specifically
  • User has <8GB RAM
  • User needs real-time updates (like web search)
  • User wants Claude Code itself (requires Anthropic API)

Related Skills

  • local-llm-wrapper - Generic local LLM integration
  • ai-sdk-agents - AI SDK with Ollama support
  • privacy-first-ai - Privacy-focused AI workflows