| name | chat |
| description | Use when starting a new project with llmring, building an application using LLMs, making basic chat completions, or sending messages to OpenAI, Anthropic, Google, or Ollama - covers lockfile creation (MANDATORY first step), semantic alias usage, unified interface for all providers with consistent message structure and response handling |
Basic Chat Completions
Installation
# With uv (recommended)
uv add llmring
# With pip
pip install llmring
Provider SDKs (install what you need):
uv add openai>=1.0 # OpenAI
uv add anthropic>=0.67 # Anthropic
uv add google-genai # Google Gemini
uv add ollama>=0.4 # Ollama
API Overview
This skill covers:
LLMRing- Main service classLLMRequest- Request configurationLLMResponse- Response structureMessage- Message format- Resource management with context managers
Quick Start
FIRST: Create your lockfile (required for all real applications):
# Initialize lockfile
llmring lock init
# Check available models (get current names from registry):
llmring list --provider openai
llmring list --provider anthropic
# Bind aliases using CURRENT model names:
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
# Or use interactive configuration (recommended - knows current models):
llmring lock chat
⚠️ Important: Check llmring list for current model names. Models change (e.g., gemini-2.5-pro → gemini-2.5-pro).
THEN: Use in code:
from llmring import LLMRing, LLMRequest, Message
# Use context manager for automatic resource cleanup
async with LLMRing() as service:
request = LLMRequest(
model="summarizer", # YOUR semantic alias (defined in llmring.lock)
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Hello!")
]
)
response = await service.chat(request)
print(response.content)
⚠️ Important: The bundled lockfile that ships with llmring is ONLY for running llmring lock chat. Real applications must create their own lockfile.
Timeout Control
The library enforces a 60-second timeout by default. Override it when processing large documents, running expensive reasoning chains, or forwarding calls to slower local models.
async with LLMRing(timeout=300.0) as service: # default for this context manager
request = LLMRequest(
model="summarizer",
messages=[Message(role="user", content=huge_thread)],
timeout=None, # disable timeout for this request
)
response = await service.chat(request)
You can also set LLMRING_PROVIDER_TIMEOUT_S=120 in the environment to establish a default when you don't pass the constructor argument.
Complete API Documentation
LLMRing
Main service class that manages providers and routes requests.
Constructor:
LLMRing(
origin: str = "llmring",
registry_url: Optional[str] = None,
lockfile_path: Optional[str] = None,
server_url: Optional[str] = None,
api_key: Optional[str] = None,
log_metadata: bool = True,
log_conversations: bool = False,
alias_cache_size: int = 100,
alias_cache_ttl: int = 3600,
timeout: Optional[float] = 60.0
)
Parameters:
origin(str, default: "llmring"): Origin identifier for trackingregistry_url(str, optional): Custom registry URL for model informationlockfile_path(str, optional): Path to lockfile for alias configurationserver_url(str, optional): llmring-server URL for usage loggingapi_key(str, optional): API key for llmring-serverlog_metadata(bool, default: True): Enable logging of usage metadata (requires server_url)log_conversations(bool, default: False): Enable logging of full conversations (requires server_url)alias_cache_size(int, default: 100): Maximum cached alias resolutionsalias_cache_ttl(int, default: 3600): Cache TTL in secondstimeout(float | None, default: 60.0): Default request timeout in seconds (Nonedisables)
Example:
from llmring import LLMRing
# Basic initialization (uses environment variables for API keys)
async with LLMRing() as service:
response = await service.chat(request)
# With custom lockfile
async with LLMRing(lockfile_path="./my-llmring.lock") as service:
response = await service.chat(request)
LLMRing.chat()
Send a chat completion request and get a response.
Signature:
async def chat(
request: LLMRequest,
profile: Optional[str] = None
) -> LLMResponse
Parameters:
request(LLMRequest): Request configuration with messages and parametersprofile(str, optional): Profile name for environment-specific configuration (e.g., "dev", "prod")
Returns:
LLMResponse: Response with content, usage, and metadata
Raises:
ProviderNotFoundError: If provider is not configuredModelNotFoundError: If model is not availableProviderAuthenticationError: If API key is invalidProviderRateLimitError: If rate limit exceeded
Example:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
request = LLMRequest(
model="responder", # Your alias for responses
messages=[
Message(role="user", content="What is 2+2?")
],
temperature=0.7,
max_tokens=100
)
response = await service.chat(request)
print(f"Response: {response.content}")
print(f"Tokens: {response.total_tokens}")
print(f"Model: {response.model}")
LLMRequest
Configuration for a chat completion request.
Constructor:
LLMRequest(
messages: List[Message],
model: Optional[str] = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
reasoning_tokens: Optional[int] = None,
response_format: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[Union[str, Dict[str, Any]]] = None,
cache: Optional[Dict[str, Any]] = None,
metadata: Optional[Dict[str, Any]] = None,
json_response: Optional[bool] = None,
timeout: Optional[float] = None,
extra_params: Dict[str, Any] = {}
)
Parameters:
messages(List[Message], required): Conversation messagesmodel(str, optional): Model alias (e.g., "fast") or provider:model reference (e.g., "openai:gpt-4o")temperature(float, optional): Sampling temperature (0.0-2.0). Higher = more randommax_tokens(int, optional): Maximum tokens to generatereasoning_tokens(int, optional): Token budget for reasoning models (o1, etc.)response_format(dict, optional): Structured output format (see llmring-structured skill)tools(list, optional): Available functions (see llmring-tools skill)tool_choice(str/dict, optional): Tool selection strategycache(dict, optional): Caching configurationmetadata(dict, optional): Request metadatajson_response(bool, optional): Request JSON format responsetimeout(float | None, optional): Override service-level timeout;Nonewaits indefinitelyextra_params(dict, default: {}): Provider-specific parameters
Example:
from llmring import LLMRequest, Message
# Simple request
request = LLMRequest(
model="summarizer", # Your domain-specific alias
messages=[Message(role="user", content="Hello")]
)
# With parameters
request = LLMRequest(
model="explainer", # Another semantic alias you define
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Explain quantum computing")
],
temperature=0.3,
max_tokens=500
)
Message
A message in a conversation.
Constructor:
Message(
role: Literal["system", "user", "assistant", "tool"],
content: Any,
tool_calls: Optional[List[Dict[str, Any]]] = None,
tool_call_id: Optional[str] = None,
timestamp: Optional[datetime] = None,
metadata: Optional[Dict[str, Any]] = None
)
Parameters:
role(str, required): Message role - "system", "user", "assistant", or "tool"content(Any, required): Message content (string or structured content for multimodal)tool_calls(list, optional): Tool calls made by assistanttool_call_id(str, optional): ID for tool result messagestimestamp(datetime, optional): Message timestampmetadata(dict, optional): Provider-specific metadata (e.g., cache_control for Anthropic)
Example:
from llmring import Message
# System message
system_msg = Message(
role="system",
content="You are a helpful assistant."
)
# User message
user_msg = Message(
role="user",
content="What is the capital of France?"
)
# Assistant response
assistant_msg = Message(
role="assistant",
content="The capital of France is Paris."
)
# Anthropic prompt caching
cached_msg = Message(
role="system",
content="Very long system prompt...",
metadata={"cache_control": {"type": "ephemeral"}}
)
LLMResponse
Response from a chat completion.
Attributes:
content(str): Generated text contentmodel(str): Model that generated the responseusage(dict, optional): Token usage statisticsfinish_reason(str, optional): Why generation stopped ("stop", "length", "tool_calls")tool_calls(list, optional): Tool calls made by modelparsed(dict, optional): Parsed JSON when response_format used
Properties:
total_tokens(int, optional): Total tokens used (prompt + completion)
Example:
response = await service.chat(request)
print(response.content) # "The capital is Paris."
print(response.model) # "anthropic:claude-sonnet-4-5-20250929"
print(response.total_tokens) # 45
print(response.finish_reason) # "stop"
print(response.usage) # {"prompt_tokens": 20, "completion_tokens": 25}
Environment Setup
Required environment variables (set API keys for providers you want to use):
# Add to .env file or export
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...
OLLAMA_BASE_URL=http://localhost:11434 # Optional, default shown
LLMRing automatically initializes providers based on available API keys.
Resource Management
Context Manager (Recommended)
Always use context manager for automatic cleanup:
from llmring import LLMRing, LLMRequest, Message
# Context manager handles cleanup automatically
async with LLMRing() as service:
request = LLMRequest(
model="chatbot", # Your alias for conversational AI
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
# Resources cleaned up when exiting context
Manual Cleanup
If you can't use context manager:
service = LLMRing()
try:
response = await service.chat(request)
finally:
await service.close() # MUST call close()
Common Patterns
Multi-Turn Conversation
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
messages = [
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="What is Python?")
]
# First turn
request = LLMRequest(model="assistant", messages=messages)
response = await service.chat(request)
# Add assistant response to history
messages.append(Message(role="assistant", content=response.content))
# Second turn
messages.append(Message(role="user", content="What about JavaScript?"))
request = LLMRequest(model="assistant", messages=messages)
response = await service.chat(request)
print(response.content)
Using Model Aliases
# Semantic aliases YOU define in your lockfile
request = LLMRequest(
model="summarizer", # Alias you configured for this task
messages=[Message(role="user", content="Hello")]
)
# Use task-based names:
# model="code-reviewer" - For code review tasks
# model="sql-generator" - For generating SQL
# model="extractor" - For extracting structured data
# model="analyzer" - For analysis tasks
Using Direct Model References
# Direct provider:model format (escape hatch)
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Hello")]
)
# Or specific versions
request = LLMRequest(
model="openai:gpt-4o",
messages=[Message(role="user", content="Hello")]
)
Temperature Control
# Creative writing (higher temperature)
request = LLMRequest(
model="creative-writer", # Your alias for creative tasks
messages=[Message(role="user", content="Write a poem")],
temperature=1.2 # More random/creative
)
# Factual responses (lower temperature)
request = LLMRequest(
model="factual-responder", # Your alias for factual tasks
messages=[Message(role="user", content="What is 2+2?")],
temperature=0.2 # More deterministic
)
Token Limits
# Limit response length
request = LLMRequest(
model="summarizer", # Your summarization alias
messages=[Message(role="user", content="Summarize this...")],
max_tokens=100 # Cap at 100 tokens
)
Error Handling
from llmring import (
LLMRing,
LLMRequest,
Message,
ProviderAuthenticationError,
ModelNotFoundError,
ProviderRateLimitError,
ProviderTimeoutError,
ProviderNotFoundError
)
async with LLMRing() as service:
try:
request = LLMRequest(
model="chatbot", # Your conversational alias
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
except ProviderAuthenticationError:
print("Invalid API key - check environment variables")
except ModelNotFoundError as e:
print(f"Model not available: {e}")
except ProviderRateLimitError as e:
print(f"Rate limited - retry after {e.retry_after}s")
except ProviderTimeoutError:
print("Request timed out")
except ProviderNotFoundError:
print("Provider not configured - check API keys")
Common Mistakes
Wrong: Forgetting Context Manager
# DON'T DO THIS - resources not cleaned up
service = LLMRing()
response = await service.chat(request)
# Forgot to call close()!
Right: Use Context Manager
# DO THIS - automatic cleanup
async with LLMRing() as service:
response = await service.chat(request)
Wrong: Invalid Message Role
# DON'T DO THIS - invalid role
message = Message(role="admin", content="Hello")
Right: Use Valid Roles
# DO THIS - valid roles only
message = Message(role="user", content="Hello")
# Valid: "system", "user", "assistant", "tool"
Wrong: Missing Model
# DON'T DO THIS - no model specified and no lockfile
request = LLMRequest(
messages=[Message(role="user", content="Hello")]
)
Right: Use Semantic Alias from Lockfile
# DO THIS - use your semantic alias
request = LLMRequest(
model="chatbot", # or "anthropic:claude-sonnet-4-5-20250929" for direct reference
messages=[Message(role="user", content="Hello")]
)
Profiles: Environment-Specific Configuration
Use different models for different environments:
# Set profile via environment variable
# export LLMRING_PROFILE=dev
# Or in code
async with LLMRing() as service:
# Uses 'dev' profile bindings (cheaper models)
response = await service.chat(request, profile="dev")
# Uses 'prod' profile bindings (higher quality)
response = await service.chat(request, profile="prod")
See llmring-lockfile skill for full profile documentation.
Related Skills
llmring-streaming- Stream responses for real-time outputllmring-tools- Function calling and tool usellmring-structured- JSON schema for structured outputllmring-lockfile- Configure aliases and profilesllmring-providers- Multi-provider patterns and raw SDK access
Provider Support
| Provider | Initialization | Example |
|---|---|---|
| OpenAI | Set OPENAI_API_KEY |
model="openai:gpt-4o" |
| Anthropic | Set ANTHROPIC_API_KEY |
model="anthropic:claude-sonnet-4-5-20250929" |
Set GOOGLE_GEMINI_API_KEY |
model="google:gemini-2.5-pro" |
|
| Ollama | Runs automatically | model="ollama:llama3" |
All providers work with the same unified API - no code changes needed to switch providers.