Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name localai
description LocalAI local inference API management via Podman Quadlet. Provides an OpenAI-compatible API for local model inference with GPU acceleration. Use when users need to configure, start, or manage the LocalAI service.

LocalAI - Local AI Inference API

Overview

The localai command manages the LocalAI service using Podman Quadlet containers. It provides an OpenAI-compatible API for running AI models locally with GPU acceleration.

Key Features:

  • OpenAI-compatible API endpoints
  • GPU-specific container images (auto-selected)
  • Multiple GPU support (NVIDIA, AMD, Intel)
  • Cross-pod DNS via bazzite-ai network

Quick Reference

Action Command Description
Config ujust localai config Configure instance
Start ujust localai start Start service
Stop ujust localai stop Stop service
Restart ujust localai restart Restart service
Logs ujust localai logs View logs
Status ujust localai status Show status
URL ujust localai url Show API URL
List ujust localai list List instances
Shell ujust localai shell Container shell
Delete ujust localai delete Remove service

Named Parameters

All parameters use named syntax (e.g., PORT=8081):

Parameter Default Description
ACTION (menu) Action to perform
PORT 8080 Host port for API
IMAGE (auto by GPU) Container image
BIND 127.0.0.1 Bind address (127.0.0.1 or 0.0.0.0)
INSTANCE 1 Instance number or all
LINES 50 Log lines to show
CMD (empty) Shell command

GPU-Specific Images

LocalAI uses different container images optimized for each GPU type:

GPU Type Image Auto-Selected?
CPU (none) localai/localai:latest Yes
NVIDIA localai/localai:latest-gpu-nvidia-cuda-12 Yes
AMD localai/localai:latest-gpu-hipblas Yes
Intel localai/localai:latest-gpu-intel Yes

The appropriate image is automatically selected based on detected GPU hardware.

Configuration

# Default configuration (auto-detects GPU, port 8080)
ujust localai config

# Custom port
ujust localai config PORT=8081

# Network-wide access (0.0.0.0)
ujust localai config BIND=0.0.0.0

# Force CPU image (ignore GPU)
ujust localai config IMAGE=localai/localai:latest

# Combine parameters
ujust localai config PORT=8081 BIND=0.0.0.0

Update Existing Configuration

Running config when already configured updates the existing settings:

# Change only the bind address
ujust localai config BIND=0.0.0.0

# Update port without affecting other settings
ujust localai config PORT=8082

Lifecycle Management

# Start LocalAI
ujust localai start

# Stop service
ujust localai stop

# Restart (apply config changes)
ujust localai restart

# View logs (default 50 lines)
ujust localai logs

# View more logs
ujust localai logs LINES=200

# Check status
ujust localai status

# Show API URL
ujust localai url

Multi-Instance Support

# Start all instances
ujust localai start INSTANCE=all

# Stop specific instance
ujust localai stop INSTANCE=2

# Delete all instances
ujust localai delete INSTANCE=all

Shell Access

# Interactive shell
ujust localai shell

# Run specific command
ujust localai shell CMD="ls -la /models"
ujust localai shell CMD="nvidia-smi"

Network Architecture

LocalAI uses the bazzite-ai bridge network for cross-container DNS:

+-------------------+     DNS      +-------------------+
|   Open WebUI      | -----------> |     LocalAI       |
|   (openwebui)     |              |    (localai)      |
|   Port 3000       |              |   Port 8080       |
+-------------------+              +-------------------+
         |                                  |
         +------ bazzite-ai network --------+
                         |
+-------------------+    |    +-------------------+
|     Ollama        |----+----+     Jupyter       |
|    (ollama)       |         |    (jupyter)      |
|   Port 11434      |         |   Port 8888       |
+-------------------+         +-------------------+

Cross-Pod DNS:

  • LocalAI accessible as http://localai:8080 from other containers
  • Can replace Ollama as backend for OpenWebUI

API Endpoints (OpenAI-Compatible)

Endpoint Description
/v1/models List available models
/v1/chat/completions Chat completions
/v1/completions Text completions
/v1/embeddings Generate embeddings
/v1/images/generations Image generation
/v1/audio/transcriptions Speech-to-text

Example API Usage

# List models
curl http://localhost:8080/v1/models

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model Storage

Path Description
~/.config/localai/<INSTANCE>/models Model files

Models persist across container restarts. Each instance has isolated storage.

Loading Models

Place model files (GGUF, GGML) in the models directory:

# Copy a model
cp my-model.gguf ~/.config/localai/1/models/

# Or download directly
curl -L -o ~/.config/localai/1/models/model.gguf \
  https://huggingface.co/.../model.gguf

Common Workflows

Initial Setup

# 1. Configure LocalAI (auto-detects GPU)
ujust localai config

# 2. Start the service
ujust localai start

# 3. Check the API
ujust localai url
# Output: http://127.0.0.1:8080

# 4. Test the API
curl http://localhost:8080/v1/models

Use with OpenWebUI

OpenWebUI can use LocalAI as an OpenAI-compatible backend:

# Start LocalAI
ujust localai start

# In OpenWebUI settings, add connection:
# URL: http://localai:8080/v1  (cross-pod DNS)
# Or: http://host.containers.internal:8080/v1  (from host)

Remote Access Setup

# Configure for network access
ujust localai config BIND=0.0.0.0

# Start the service
ujust localai start

# Or use Tailscale for secure access
ujust tailscale serve localai

GPU Support

GPU is automatically detected and the appropriate image is selected:

GPU Type Detection Device Passthrough
NVIDIA nvidia-smi CDI (nvidia.com/gpu=all)
AMD lspci /dev/dri + /dev/kfd
Intel lspci /dev/dri

Check GPU in Container

# NVIDIA
ujust localai shell CMD="nvidia-smi"

# Check GPU environment
ujust localai shell CMD="env | grep -i gpu"

Troubleshooting

Service Won't Start

# Check status
ujust localai status

# View logs
ujust localai logs LINES=100

# Check image was pulled
podman images | grep localai

Common causes:

  • Port 8080 already in use
  • Container image not pulled
  • GPU driver issues

GPU Not Detected

NVIDIA:

# Check CDI configuration
nvidia-ctk cdi list

# Regenerate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

AMD:

# Check /dev/kfd exists
ls -la /dev/kfd

# Check ROCm
rocminfo

API Errors

# Test API endpoint
curl http://localhost:8080/v1/models

# Check logs for errors
ujust localai logs LINES=100

Clear Data and Start Fresh

# Delete everything
ujust localai delete INSTANCE=all

# Reconfigure
ujust localai config
ujust localai start

Cross-References

  • Network peers: ollama, openwebui, jupyter, comfyui (all use bazzite-ai network)
  • Alternative: ollama (simpler model management, different API)
  • Client: openwebui (can use LocalAI as backend)
  • Docs: LocalAI Documentation

When to Use This Skill

Use when the user asks about:

  • "install localai", "setup local inference", "openai-compatible api"
  • "configure localai", "change port", "gpu acceleration"
  • "localai not working", "api error", "model loading"
  • "localai logs", "debug localai"
  • "delete localai", "uninstall"