| name | chat |
| description | Use when starting a new project with llmring, building an application using LLMs, making basic chat completions, or sending messages to OpenAI, Anthropic, Google, or Ollama - covers lockfile creation (MANDATORY first step), semantic alias usage, unified interface for all providers with consistent message structure and response handling |
Basic Chat Completions
Installation
# With uv (recommended)
uv add llmring
# With pip
pip install llmring
Provider SDKs (install what you need):
uv add openai>=1.0 # OpenAI
uv add anthropic>=0.67 # Anthropic
uv add google-genai # Google Gemini
uv add ollama>=0.4 # Ollama
API Overview
This skill covers:
LLMRing- Main service classLLMRequest- Request configurationLLMResponse- Response structureMessage- Message format- Resource management with context managers
Quick Start
FIRST: Create your lockfile (required for all real applications):
# Initialize lockfile
llmring lock init
# Check available models (get current names from registry):
llmring list --provider openai
llmring list --provider anthropic
# Bind aliases using CURRENT model names:
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
# Or use interactive configuration (recommended - knows current models):
llmring lock chat
⚠️ Important: Check llmring list for current model names. Models change (e.g., gemini-2.5-pro → gemini-2.5-pro).
THEN: Use in code:
from llmring import LLMRing, LLMRequest, Message
# Use context manager for automatic resource cleanup
async with LLMRing() as service:
request = LLMRequest(
model="summarizer", # YOUR semantic alias (defined in llmring.lock)
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Hello!")
]
)
response = await service.chat(request)
print(response.content)
⚠️ Important: The bundled lockfile that ships with llmring is ONLY for running llmring lock chat. Real applications must create their own lockfile.
Complete API Documentation
LLMRing
Main service class that manages providers and routes requests.
Constructor:
LLMRing(
origin: str = "llmring",
registry_url: Optional[str] = None,
lockfile_path: Optional[str] = None,
server_url: Optional[str] = None,
api_key: Optional[str] = None,
log_metadata: bool = True,
log_conversations: bool = False,
alias_cache_size: int = 100,
alias_cache_ttl: int = 3600
)
Parameters:
origin(str, default: "llmring"): Origin identifier for trackingregistry_url(str, optional): Custom registry URL for model informationlockfile_path(str, optional): Path to lockfile for alias configurationserver_url(str, optional): llmring-server URL for usage loggingapi_key(str, optional): API key for llmring-serverlog_metadata(bool, default: True): Enable logging of usage metadata (requires server_url)log_conversations(bool, default: False): Enable logging of full conversations (requires server_url)alias_cache_size(int, default: 100): Maximum cached alias resolutionsalias_cache_ttl(int, default: 3600): Cache TTL in seconds
Example:
from llmring import LLMRing
# Basic initialization (uses environment variables for API keys)
async with LLMRing() as service:
response = await service.chat(request)
# With custom lockfile
async with LLMRing(lockfile_path="./my-llmring.lock") as service:
response = await service.chat(request)
LLMRing.chat()
Send a chat completion request and get a response.
Signature:
async def chat(
request: LLMRequest,
profile: Optional[str] = None
) -> LLMResponse
Parameters:
request(LLMRequest): Request configuration with messages and parametersprofile(str, optional): Profile name for environment-specific configuration (e.g., "dev", "prod")
Returns:
LLMResponse: Response with content, usage, and metadata
Raises:
ProviderNotFoundError: If provider is not configuredModelNotFoundError: If model is not availableProviderAuthenticationError: If API key is invalidProviderRateLimitError: If rate limit exceeded
Example:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
request = LLMRequest(
model="responder", # Your alias for responses
messages=[
Message(role="user", content="What is 2+2?")
],
temperature=0.7,
max_tokens=100
)
response = await service.chat(request)
print(f"Response: {response.content}")
print(f"Tokens: {response.total_tokens}")
print(f"Model: {response.model}")
LLMRequest
Configuration for a chat completion request.
Constructor:
LLMRequest(
messages: List[Message],
model: Optional[str] = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
reasoning_tokens: Optional[int] = None,
response_format: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[Union[str, Dict[str, Any]]] = None,
cache: Optional[Dict[str, Any]] = None,
metadata: Optional[Dict[str, Any]] = None,
json_response: Optional[bool] = None,
extra_params: Dict[str, Any] = {}
)
Parameters:
messages(List[Message], required): Conversation messagesmodel(str, optional): Model alias (e.g., "fast") or provider:model reference (e.g., "openai:gpt-4o")temperature(float, optional): Sampling temperature (0.0-2.0). Higher = more randommax_tokens(int, optional): Maximum tokens to generatereasoning_tokens(int, optional): Token budget for reasoning models (o1, etc.)response_format(dict, optional): Structured output format (see llmring-structured skill)tools(list, optional): Available functions (see llmring-tools skill)tool_choice(str/dict, optional): Tool selection strategycache(dict, optional): Caching configurationmetadata(dict, optional): Request metadatajson_response(bool, optional): Request JSON format responseextra_params(dict, default: {}): Provider-specific parameters
Example:
from llmring import LLMRequest, Message
# Simple request
request = LLMRequest(
model="summarizer", # Your domain-specific alias
messages=[Message(role="user", content="Hello")]
)
# With parameters
request = LLMRequest(
model="explainer", # Another semantic alias you define
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Explain quantum computing")
],
temperature=0.3,
max_tokens=500
)
Message
A message in a conversation.
Constructor:
Message(
role: Literal["system", "user", "assistant", "tool"],
content: Any,
tool_calls: Optional[List[Dict[str, Any]]] = None,
tool_call_id: Optional[str] = None,
timestamp: Optional[datetime] = None,
metadata: Optional[Dict[str, Any]] = None
)
Parameters:
role(str, required): Message role - "system", "user", "assistant", or "tool"content(Any, required): Message content (string or structured content for multimodal)tool_calls(list, optional): Tool calls made by assistanttool_call_id(str, optional): ID for tool result messagestimestamp(datetime, optional): Message timestampmetadata(dict, optional): Provider-specific metadata (e.g., cache_control for Anthropic)
Example:
from llmring import Message
# System message
system_msg = Message(
role="system",
content="You are a helpful assistant."
)
# User message
user_msg = Message(
role="user",
content="What is the capital of France?"
)
# Assistant response
assistant_msg = Message(
role="assistant",
content="The capital of France is Paris."
)
# Anthropic prompt caching
cached_msg = Message(
role="system",
content="Very long system prompt...",
metadata={"cache_control": {"type": "ephemeral"}}
)
LLMResponse
Response from a chat completion.
Attributes:
content(str): Generated text contentmodel(str): Model that generated the responseusage(dict, optional): Token usage statisticsfinish_reason(str, optional): Why generation stopped ("stop", "length", "tool_calls")tool_calls(list, optional): Tool calls made by modelparsed(dict, optional): Parsed JSON when response_format used
Properties:
total_tokens(int, optional): Total tokens used (prompt + completion)
Example:
response = await service.chat(request)
print(response.content) # "The capital is Paris."
print(response.model) # "anthropic:claude-sonnet-4-5-20250929"
print(response.total_tokens) # 45
print(response.finish_reason) # "stop"
print(response.usage) # {"prompt_tokens": 20, "completion_tokens": 25}
Environment Setup
Required environment variables (set API keys for providers you want to use):
# Add to .env file or export
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...
OLLAMA_BASE_URL=http://localhost:11434 # Optional, default shown
LLMRing automatically initializes providers based on available API keys.
Resource Management
Context Manager (Recommended)
Always use context manager for automatic cleanup:
from llmring import LLMRing, LLMRequest, Message
# Context manager handles cleanup automatically
async with LLMRing() as service:
request = LLMRequest(
model="chatbot", # Your alias for conversational AI
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
# Resources cleaned up when exiting context
Manual Cleanup
If you can't use context manager:
service = LLMRing()
try:
response = await service.chat(request)
finally:
await service.close() # MUST call close()
Common Patterns
Multi-Turn Conversation
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
messages = [
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="What is Python?")
]
# First turn
request = LLMRequest(model="assistant", messages=messages)
response = await service.chat(request)
# Add assistant response to history
messages.append(Message(role="assistant", content=response.content))
# Second turn
messages.append(Message(role="user", content="What about JavaScript?"))
request = LLMRequest(model="assistant", messages=messages)
response = await service.chat(request)
print(response.content)
Using Model Aliases
# Semantic aliases YOU define in your lockfile
request = LLMRequest(
model="summarizer", # Alias you configured for this task
messages=[Message(role="user", content="Hello")]
)
# Use task-based names:
# model="code-reviewer" - For code review tasks
# model="sql-generator" - For generating SQL
# model="extractor" - For extracting structured data
# model="analyzer" - For analysis tasks
Using Direct Model References
# Direct provider:model format (escape hatch)
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Hello")]
)
# Or specific versions
request = LLMRequest(
model="openai:gpt-4o",
messages=[Message(role="user", content="Hello")]
)
Temperature Control
# Creative writing (higher temperature)
request = LLMRequest(
model="creative-writer", # Your alias for creative tasks
messages=[Message(role="user", content="Write a poem")],
temperature=1.2 # More random/creative
)
# Factual responses (lower temperature)
request = LLMRequest(
model="factual-responder", # Your alias for factual tasks
messages=[Message(role="user", content="What is 2+2?")],
temperature=0.2 # More deterministic
)
Token Limits
# Limit response length
request = LLMRequest(
model="summarizer", # Your summarization alias
messages=[Message(role="user", content="Summarize this...")],
max_tokens=100 # Cap at 100 tokens
)
Error Handling
from llmring import (
LLMRing,
LLMRequest,
Message,
ProviderAuthenticationError,
ModelNotFoundError,
ProviderRateLimitError,
ProviderTimeoutError,
ProviderNotFoundError
)
async with LLMRing() as service:
try:
request = LLMRequest(
model="chatbot", # Your conversational alias
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
except ProviderAuthenticationError:
print("Invalid API key - check environment variables")
except ModelNotFoundError as e:
print(f"Model not available: {e}")
except ProviderRateLimitError as e:
print(f"Rate limited - retry after {e.retry_after}s")
except ProviderTimeoutError:
print("Request timed out")
except ProviderNotFoundError:
print("Provider not configured - check API keys")
Common Mistakes
Wrong: Forgetting Context Manager
# DON'T DO THIS - resources not cleaned up
service = LLMRing()
response = await service.chat(request)
# Forgot to call close()!
Right: Use Context Manager
# DO THIS - automatic cleanup
async with LLMRing() as service:
response = await service.chat(request)
Wrong: Invalid Message Role
# DON'T DO THIS - invalid role
message = Message(role="admin", content="Hello")
Right: Use Valid Roles
# DO THIS - valid roles only
message = Message(role="user", content="Hello")
# Valid: "system", "user", "assistant", "tool"
Wrong: Missing Model
# DON'T DO THIS - no model specified and no lockfile
request = LLMRequest(
messages=[Message(role="user", content="Hello")]
)
Right: Use Semantic Alias from Lockfile
# DO THIS - use your semantic alias
request = LLMRequest(
model="chatbot", # or "anthropic:claude-sonnet-4-5-20250929" for direct reference
messages=[Message(role="user", content="Hello")]
)
Profiles: Environment-Specific Configuration
Use different models for different environments:
# Set profile via environment variable
# export LLMRING_PROFILE=dev
# Or in code
async with LLMRing() as service:
# Uses 'dev' profile bindings (cheaper models)
response = await service.chat(request, profile="dev")
# Uses 'prod' profile bindings (higher quality)
response = await service.chat(request, profile="prod")
See llmring-lockfile skill for full profile documentation.
Related Skills
llmring-streaming- Stream responses for real-time outputllmring-tools- Function calling and tool usellmring-structured- JSON schema for structured outputllmring-lockfile- Configure aliases and profilesllmring-providers- Multi-provider patterns and raw SDK access
Provider Support
| Provider | Initialization | Example |
|---|---|---|
| OpenAI | Set OPENAI_API_KEY |
model="openai:gpt-4o" |
| Anthropic | Set ANTHROPIC_API_KEY |
model="anthropic:claude-sonnet-4-5-20250929" |
Set GOOGLE_GEMINI_API_KEY |
model="google:gemini-2.5-pro" |
|
| Ollama | Runs automatically | model="ollama:llama3" |
All providers work with the same unified API - no code changes needed to switch providers.