Claude Code Plugins

Community-maintained marketplace

Feedback

Ollama API Documentation

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ollama
description Ollama API Documentation

Ollama Skill

Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.

When to Use This Skill

This skill should be triggered when:

  • Running local AI models with Ollama
  • Building applications that interact with Ollama's API
  • Implementing chat completions, embeddings, or streaming responses
  • Setting up Ollama authentication or cloud models
  • Configuring Ollama server (environment variables, ports, proxies)
  • Using Ollama with OpenAI-compatible libraries
  • Troubleshooting Ollama installations or GPU compatibility
  • Implementing tool calling, structured outputs, or vision capabilities
  • Working with Ollama in Docker or behind proxies
  • Creating, copying, pushing, or managing Ollama models

Quick Reference

1. Basic Chat Completion (cURL)

Generate a simple chat response:

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

2. Simple Text Generation (cURL)

Generate a text response from a prompt:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

3. Python Chat with OpenAI Library

Use Ollama with the OpenAI Python library:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

4. Vision Model (Image Analysis)

Ask questions about images:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "...",
                },
            ],
        }
    ],
    max_tokens=300,
)

5. Generate Embeddings

Create vector embeddings for text:

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

6. Structured Outputs (JSON Schema)

Get structured JSON responses:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)

7. JavaScript/TypeScript Chat

Use Ollama with the OpenAI JavaScript library:

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});

8. Authentication for Cloud Models

Sign in to use cloud models:

# Sign in from CLI
ollama signin

# Then use cloud models
ollama run gpt-oss:120b-cloud

Or use API keys for direct cloud access:

export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

9. Configure Ollama Server

Set environment variables for server configuration:

macOS:

# Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

# Restart Ollama application

Linux (systemd):

# Edit service
systemctl edit ollama.service

# Add under [Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Reload and restart
systemctl daemon-reload
systemctl restart ollama

Windows:

1. Quit Ollama from task bar
2. Search "environment variables" in Settings
3. Edit or create OLLAMA_HOST variable
4. Set value: 0.0.0.0:11434
5. Restart Ollama from Start menu

10. Check Model GPU Loading

Verify if your model is using GPU:

ollama ps

Output shows:

  • 100% GPU - Fully loaded on GPU
  • 100% CPU - Fully loaded in system memory
  • 48%/52% CPU/GPU - Split between both

Key Concepts

Base URLs

  • Local API (default): http://localhost:11434/api
  • Cloud API: https://ollama.com/api
  • OpenAI Compatible: /v1/ endpoints for OpenAI libraries

Authentication

  • Local: No authentication required for http://localhost:11434
  • Cloud Models: Requires signing in (ollama signin) or API key
  • API Keys: For programmatic access to https://ollama.com/api

Models

  • Local Models: Run on your machine (e.g., gemma3, llama3.2, qwen3)
  • Cloud Models: Suffix -cloud (e.g., gpt-oss:120b-cloud, qwen3-coder:480b-cloud)
  • Vision Models: Support image inputs (e.g., llava)

Common Environment Variables

  • OLLAMA_HOST - Change bind address (default: 127.0.0.1:11434)
  • OLLAMA_CONTEXT_LENGTH - Context window size (default: 2048 tokens)
  • OLLAMA_MODELS - Model storage directory
  • OLLAMA_ORIGINS - Allow additional web origins for CORS
  • HTTPS_PROXY - Proxy server for model downloads

Error Handling

Status Codes:

  • 200 - Success
  • 400 - Bad Request (invalid parameters)
  • 404 - Not Found (model doesn't exist)
  • 429 - Too Many Requests (rate limit)
  • 500 - Internal Server Error
  • 502 - Bad Gateway (cloud model unreachable)

Error Format:

{
  "error": "the model failed to generate a response"
}

Streaming vs Non-Streaming

  • Streaming (default): Returns response chunks as JSON objects (NDJSON)
  • Non-Streaming: Set "stream": false to get complete response in one object

Reference Files

This skill includes comprehensive documentation in references/:

  • llms-txt.md - Complete API reference covering:

    • All API endpoints (/api/generate, /api/chat, /api/embed, etc.)
    • Authentication methods (signin, API keys)
    • Error handling and status codes
    • OpenAI compatibility layer
    • Cloud models usage
    • Streaming responses
    • Configuration and environment variables
  • llms.md - Documentation index listing all available topics:

    • API reference (version, model details, chat, generate, embeddings)
    • Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
    • CLI reference
    • Cloud integration
    • Platform-specific guides (Linux, macOS, Windows, Docker)
    • IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)

Use the reference files when you need:

  • Detailed API parameter specifications
  • Complete endpoint documentation
  • Advanced configuration options
  • Platform-specific setup instructions
  • Integration guides for specific tools

Working with This Skill

For Beginners

Start with these common patterns:

  1. Simple generation: Use /api/generate endpoint with a prompt
  2. Chat interface: Use /api/chat with messages array
  3. OpenAI compatibility: Use OpenAI libraries with base_url='http://localhost:11434/v1/'
  4. Check GPU usage: Run ollama ps to verify model loading

Read llms-txt.md section on "Introduction" and "Quickstart" for foundational concepts.

For Intermediate Users

Focus on:

  • Embeddings for semantic search and RAG applications
  • Structured outputs with JSON schema validation
  • Vision models for image analysis
  • Streaming for real-time response generation
  • Authentication for cloud models

Check the specific API endpoints in llms-txt.md for detailed parameter options.

For Advanced Users

Explore:

  • Tool calling for function execution
  • Custom model creation with Modelfiles
  • Server configuration with environment variables
  • Proxy setup for network-restricted environments
  • Docker deployment with custom configurations
  • Performance optimization with GPU settings

Refer to platform-specific sections in llms.md and configuration details in llms-txt.md.

Common Use Cases

Building a chatbot:

  1. Use /api/chat endpoint
  2. Maintain message history in your application
  3. Stream responses for better UX
  4. Handle errors gracefully

Creating embeddings for search:

  1. Use /api/embed endpoint
  2. Store embeddings in vector database
  3. Perform similarity search
  4. Implement RAG (Retrieval Augmented Generation)

Running behind a firewall:

  1. Set HTTPS_PROXY environment variable
  2. Configure proxy in Docker if containerized
  3. Ensure certificates are trusted

Using cloud models:

  1. Run ollama signin once
  2. Pull cloud models with -cloud suffix
  3. Use same API endpoints as local models

Troubleshooting

Model Not Loading on GPU

Check:

ollama ps

Solutions:

  • Verify GPU compatibility in documentation
  • Check CUDA/ROCm installation
  • Review available VRAM
  • Try smaller model variants

Cannot Access Ollama Remotely

Problem: Ollama only accessible from localhost

Solution:

# Set OLLAMA_HOST to bind to all interfaces
export OLLAMA_HOST="0.0.0.0:11434"

See "How do I configure Ollama server?" in llms-txt.md for platform-specific instructions.

Proxy Issues

Problem: Cannot download models behind proxy

Solution:

# Set proxy (HTTPS only, not HTTP)
export HTTPS_PROXY=https://proxy.example.com

# Restart Ollama

See "How do I use Ollama behind a proxy?" in llms-txt.md.

CORS Errors in Browser

Problem: Browser extension or web app cannot access Ollama

Solution:

# Allow specific origins
export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"

See "How can I allow additional web origins?" in llms-txt.md.

Resources

Official Documentation

Official Libraries

Community

Notes

  • This skill was generated from official Ollama documentation
  • All examples are tested and working with Ollama's API
  • Code samples include proper language detection for syntax highlighting
  • Reference files preserve structure from official docs with working links
  • OpenAI compatibility means most OpenAI code works with minimal changes

Quick Command Reference

# CLI Commands
ollama signin                    # Sign in to ollama.com
ollama run gemma3               # Run a model interactively
ollama pull gemma3              # Download a model
ollama ps                       # List running models
ollama list                     # List installed models

# Check API Status
curl http://localhost:11434/api/version

# Environment Variables (Common)
export OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"