Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name openai
description OpenAI compatibility layer for Ollama. Use the official OpenAI Python library to interact with Ollama, enabling easy migration from OpenAI and compatibility with LangChain, LlamaIndex, and other OpenAI-based tools.

Ollama OpenAI Compatibility

Overview

Ollama provides an OpenAI-compatible API at /v1/* endpoints. This allows using the official openai Python library with Ollama, enabling:

  • Migration - Drop-in replacement for OpenAI API
  • Tool ecosystem - Works with LangChain, LlamaIndex, etc.
  • Familiar interface - Standard OpenAI patterns

Quick Reference

Endpoint Method Purpose
/v1/models GET List models
/v1/completions POST Text generation
/v1/chat/completions POST Chat completion
/v1/embeddings POST Generate embeddings

Limitations

The OpenAI compatibility layer does not support:

  • Show model details (/api/show)
  • List running models (/api/ps)
  • Copy model (/api/copy)
  • Delete model (/api/delete)

Use bazzite-ai-ollama:api or bazzite-ai-ollama:python for these operations.

Setup

import os
from openai import OpenAI

OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")

client = OpenAI(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama"  # Required by library but ignored by Ollama
)

List Models

models = client.models.list()

for model in models.data:
    print(f"  - {model.id}")

Text Completions

response = client.completions.create(
    model="llama3.2:latest",
    prompt="Why is the sky blue? Answer in one sentence.",
    max_tokens=100
)

print(response.choices[0].text)
print(f"Tokens used: {response.usage.completion_tokens}")

Chat Completion

Single Turn

response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in one sentence."}
    ],
    temperature=0.7,
    max_tokens=100
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Multi-Turn Conversation

messages = [
    {"role": "system", "content": "You are a helpful math tutor."}
]

# Turn 1
messages.append({"role": "user", "content": "What is 2 + 2?"})
response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=messages,
    max_tokens=50
)
assistant_msg = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_msg})
print(f"User: What is 2 + 2?")
print(f"Assistant: {assistant_msg}")

# Turn 2
messages.append({"role": "user", "content": "And what is that multiplied by 3?"})
response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=messages,
    max_tokens=50
)
print(f"User: And what is that multiplied by 3?")
print(f"Assistant: {response.choices[0].message.content}")

Streaming

stream = client.chat.completions.create(
    model="llama3.2:latest",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Generate Embeddings

response = client.embeddings.create(
    model="llama3.2:latest",
    input="Ollama makes running LLMs locally easy."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Error Handling

try:
    response = client.chat.completions.create(
        model="invalid-model",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    print(f"Error: {type(e).__name__}")

Migration from OpenAI

Before (OpenAI)

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

After (Ollama)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="llama3.2:latest",  # Change model name
    messages=[{"role": "user", "content": "Hello!"}]
)

LangChain Integration

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
    model="llama3.2:latest"
)

response = llm.invoke("What is Python?")
print(response.content)

LlamaIndex Integration

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    api_base="http://localhost:11434/v1",
    api_key="ollama",
    model="llama3.2:latest"
)

response = llm.complete("What is Python?")
print(response.text)

Connection Health Check

import requests

def check_ollama_health(model="llama3.2:latest"):
    """Check if Ollama server is running and model is available."""
    OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")
    try:
        response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5)
        if response.status_code == 200:
            models = response.json()
            model_names = [m.get("name", "") for m in models.get("models", [])]
            return True, model in model_names
        return False, False
    except requests.exceptions.RequestException:
        return False, False

server_ok, model_ok = check_ollama_health()

When to Use This Skill

Use when:

  • Migrating from OpenAI to local LLMs
  • Using LangChain, LlamaIndex, or other OpenAI-based tools
  • You prefer the OpenAI client interface
  • Building applications that may switch between OpenAI and Ollama

Cross-References

  • bazzite-ai-ollama:python - Native Ollama library (more features)
  • bazzite-ai-ollama:api - Direct REST API access