| name | ollama-client |
| description | Phi-4 LLM interaction skill for generating text completions via Ollama API. Use for all LLM inference tasks including section detection, summarization, recommendation generation, and quality evaluation. |
Ollama Client Skill
Overview
This skill provides a Python wrapper for interacting with Ollama's REST API to generate text completions using the Phi-4 model (14B parameters, 16K context window). It handles timeouts, retries, and structured logging for all LLM operations.
When to Use
Use this skill when you need to:
- Generate text completions from Phi-4
- Run prompts for clinical analysis tasks
- Generate JSON-structured outputs from LLM
- Handle LLM inference with timeout protection
Installation
IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory
cd .agent/skills/ollama-client
uv sync # Creates .venv and installs dependencies from pyproject.toml
Dependencies are in pyproject.toml:
requests- HTTP client for Ollama API
Usage
CRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.
Basic Text Generation
# From .agent/skills/ollama-client/ directory
# Run with: uv run python -c "..."
from ollama_client import OllamaClient
# Initialize client
client = OllamaClient(
host="http://localhost:11434", # Default from OLLAMA_HOST env var
model="phi4:14b", # Default from OLLAMA_MODEL env var
timeout=300 # 5 minutes default
)
# Generate completion
result = client.generate(
prompt="Summarize the following clinical note: ...",
temperature=0.1, # Low temperature for deterministic outputs
max_tokens=1000, # Optional token limit
stop_sequences=["END"] # Optional stop sequences
)
print(result["response"])
print(f"Execution time: {result['execution_time_ms']}ms")
With Environment Variables
import os
# Set in .env or docker-compose.yml
os.environ['OLLAMA_HOST'] = 'http://localhost:11434'
os.environ['OLLAMA_MODEL'] = 'phi4:14b'
# Client uses env vars automatically
client = OllamaClient()
Using from Another Module
When importing this skill from agents or other code:
import sys
from pathlib import Path
# Add skill to path (use relative path from your location)
skill_path = Path(__file__).parent.parent.parent / ".agent/skills/ollama-client"
sys.path.insert(0, str(skill_path))
from ollama_client import OllamaClient
client = OllamaClient()
Health Check
# Check if Ollama server is accessible
if client.is_available():
print("Ollama server is healthy")
else:
print("Ollama server unavailable")
Configuration
Environment Variables:
OLLAMA_HOST: Server URL (default:http://localhost:11434)OLLAMA_MODEL: Model name (default:phi4:14b)
Parameters:
temperature: Sampling temperature (0.0-1.0, default: 0.1 for deterministic outputs)max_tokens: Maximum tokens to generate (optional)stop_sequences: List of strings to stop generation (optional)timeout: Request timeout in seconds (default: 300)
Error Handling
The skill raises exceptions for:
- Timeout: If request exceeds timeout duration
- Connection Error: If Ollama server is unreachable
- API Error: If Ollama returns an error response
All errors include execution time for debugging.
Best Practices
- Low Temperature: Use
temperature=0.1for clinical tasks requiring consistency - Timeouts: Set appropriate timeouts based on prompt complexity (simple: 60s, complex: 300s)
- Health Checks: Verify server availability before critical operations
- Error Logging: Always log errors with execution time for troubleshooting
Integration with Agents
Agents use this skill for all LLM operations:
- ToC Subagent: Section topic segmentation
- Summary Subagent: Clinical entity extraction
- Recommendation Subagent: Treatment plan generation
- Evaluator Agent: Quality validation reasoning
Implementation
See ollama_client.py for the full Python implementation.