Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ollama
description Ollama LLM inference server management via Podman Quadlet. Single-instance design with GPU acceleration for running local LLMs. Use when users need to configure Ollama, pull models, run inference, or manage the Ollama server.

Ollama - Local LLM Inference Server

Overview

The ollama command manages the Ollama LLM inference server using Podman Quadlet containers. It provides a single-instance server for running local LLMs with GPU acceleration.

Key Concept: Unlike Jupyter, Ollama uses a single-instance design because GPU memory is shared across all loaded models. The API is accessible at port 11434.

Quick Reference

Action Command Description
Config ujust ollama config [PORT] [GPU] [IMAGE] [WORKSPACE] Configure server
Start ujust ollama start Start server
Stop ujust ollama stop Stop server
Restart ujust ollama restart Restart server
Logs ujust ollama logs [LINES] View logs
Status ujust ollama status Show server status
Pull ujust ollama pull <MODEL> Download a model
List ujust ollama list List installed models
Run ujust ollama run <MODEL> [PROMPT] Run model
Shell ujust ollama shell [CMD] Open container shell
Delete ujust ollama delete Remove server and images

Parameters

Config Parameters

ujust ollama config [PORT] [GPU_TYPE] [IMAGE] [WORKSPACE]
Parameter Default Description
PORT 11434 API port
GPU_TYPE auto GPU type: nvidia, amd, intel, none, auto
IMAGE stable Container image or tag
WORKSPACE (empty) Optional additional mount to /workspace

Configuration

# Default: Port 11434, auto-detect GPU
ujust ollama config

# Custom port with NVIDIA GPU
ujust ollama config 11435 nvidia

# CPU only
ujust ollama config 11434 none

# With workspace mount
ujust ollama config 11434 nvidia stable /home/user/projects

# Custom image
ujust ollama config 11434 nvidia "ghcr.io/custom/ollama:v1" /projects

Update Existing Configuration

Running config when already configured will update the existing configuration, preserving values not explicitly changed.

Shell Access

# Interactive bash shell
ujust ollama shell

# Run specific command
ujust ollama shell "nvidia-smi"
ujust ollama shell "df -h"

Model Management

Pull Models

# Download popular models
ujust ollama pull llama3.2
ujust ollama pull codellama
ujust ollama pull mistral
ujust ollama pull phi3

# Specific versions
ujust ollama pull llama3.2:7b
ujust ollama pull llama3.2:70b

List Models

ujust ollama list

Output:

NAME              SIZE      MODIFIED
llama3.2:latest   4.7 GB    2 hours ago
codellama:latest  3.8 GB    1 day ago

Run Models

# Interactive chat
ujust ollama run llama3.2

# Single prompt
ujust ollama run llama3.2 "Explain quantum computing"

# Code generation
ujust ollama run codellama "Write a Python function to sort a list"

API Access

Default Endpoint

http://localhost:11434

API Examples

# Generate completion
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello, how are you?"
}'

# Chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

# List models
curl http://localhost:11434/api/tags

Integration with Tools

# Claude Code with Ollama
export OLLAMA_HOST=http://localhost:11434

# LangChain
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.2", base_url="http://localhost:11434")

Volume Mounts

Container Path Host Path Purpose
/root/.ollama ~/.ollama Model storage

Models are persisted in ~/.ollama and survive container restarts.

Common Workflows

Initial Setup

# 1. Configure Ollama with GPU
ujust ollama config 11434 nvidia

# 2. Start the server
ujust ollama start

# 3. Pull a model
ujust ollama pull llama3.2

# 4. Test it
ujust ollama run llama3.2 "Hello!"

Development with Local LLM

# Start Ollama
ujust ollama start

# In your code, use:
# OLLAMA_HOST=http://localhost:11434

Model Comparison

# Pull multiple models
ujust ollama pull llama3.2
ujust ollama pull mistral
ujust ollama pull phi3

# Compare responses
ujust ollama run llama3.2 "Explain REST APIs"
ujust ollama run mistral "Explain REST APIs"
ujust ollama run phi3 "Explain REST APIs"

GPU Support

Automatic Detection

ujust ollama config  # Auto-detects GPU

Manual Selection

GPU Type Flag VRAM Usage
NVIDIA nvidia Full GPU acceleration
AMD amd ROCm acceleration
Intel intel oneAPI acceleration
None none CPU only (slower)

Check GPU Status

ujust ollama shell "nvidia-smi"  # NVIDIA
ujust ollama shell "rocm-smi"    # AMD

Model Size Guide

Model Parameters VRAM Needed Quality
phi3 3B 4GB Fast, basic
llama3.2 8B 8GB Balanced
mistral 7B 8GB Good coding
codellama 7B 8GB Code-focused
llama3.2:70b 70B 48GB+ Best quality

Troubleshooting

Server Won't Start

Check:

systemctl --user status ollama
ujust ollama logs 50

Common causes:

  • Port 11434 already in use
  • GPU driver issues
  • Image not pulled

Model Loading Fails

Symptom: "out of memory" or slow loading

Cause: Model too large for GPU VRAM

Fix:

# Use smaller model
ujust ollama pull phi3  # Only 4GB VRAM

# Or use quantized version
ujust ollama pull llama3.2:7b-q4_0

GPU Not Used

Symptom: Inference very slow

Check:

ujust ollama status
ujust ollama shell "nvidia-smi"

Fix:

# Reconfigure with explicit GPU
ujust ollama delete
ujust ollama config 11434 nvidia

API Not Responding

Symptom: curl localhost:11434 fails

Check:

ujust ollama status
ujust ollama logs

Fix:

ujust ollama restart

Cross-References

When to Use This Skill

Use when the user asks about:

  • "install ollama", "setup local LLM", "run LLM locally"
  • "pull model", "download llama", "get mistral"
  • "ollama not working", "model won't load"
  • "ollama GPU", "ollama cuda", "ollama slow"
  • "ollama API", "integrate with ollama"