| name | token-cost-tracking |
| description | Track token usage and costs across agents for budget management |
| triggers | track tokens, token usage, cost tracking, LLM costs, budget monitoring |
| priority | 1 |
Token and Cost Tracking
Track token usage and costs across agents for budget management and optimization.
Core Principle
Every organization needs to answer:
- How much are we spending on LLMs?
- Where is the spend going (features, agents, users)?
- Are we within budget?
- What's trending (up or down)?
Essential Attributes
# Per-call token tracking
span.set_attribute("llm.tokens.input", 1500)
span.set_attribute("llm.tokens.output", 350)
span.set_attribute("llm.tokens.total", 1850)
# Per-call cost
span.set_attribute("llm.cost_usd", 0.025)
# Attribution
span.set_attribute("cost.feature", "document_analysis")
span.set_attribute("cost.agent", "researcher")
span.set_attribute("cost.user_id", "user_abc") # Hashed
span.set_attribute("cost.org_id", "org_123")
Model Pricing Table
Keep pricing updated (prices as of late 2024):
MODEL_PRICING = {
# Anthropic (per 1M tokens)
"claude-3-opus": {"input": 15.00, "output": 75.00},
"claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku": {"input": 0.80, "output": 4.00},
"claude-3-haiku": {"input": 0.25, "output": 1.25},
# OpenAI (per 1M tokens)
"gpt-4-turbo": {"input": 10.00, "output": 30.00},
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-3.5-turbo": {"input": 0.50, "output": 1.50},
# Embeddings (per 1M tokens)
"text-embedding-3-large": {"input": 0.13, "output": 0},
"text-embedding-3-small": {"input": 0.02, "output": 0},
"text-embedding-ada-002": {"input": 0.10, "output": 0},
}
Cost Calculation
def calculate_cost(
model: str,
input_tokens: int,
output_tokens: int,
cached_tokens: int = 0,
cache_discount: float = 0.9 # 90% discount for cached
) -> float:
"""Calculate cost for an LLM call."""
pricing = MODEL_PRICING.get(model)
if not pricing:
return 0.0
# Cached tokens are discounted
effective_input = input_tokens - cached_tokens
cached_cost = (cached_tokens / 1_000_000) * pricing["input"] * (1 - cache_discount)
input_cost = (effective_input / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
return round(input_cost + cached_cost + output_cost, 6)
Aggregation Levels
Track at multiple granularities:
Per-Call
span.set_attribute("llm.cost_usd", 0.025)
Per-Agent Run
span.set_attribute("agent.total_tokens", 15000)
span.set_attribute("agent.total_cost_usd", 0.45)
span.set_attribute("agent.llm_calls", 5)
Per-Session
span.set_attribute("session.total_tokens", 45000)
span.set_attribute("session.total_cost_usd", 1.35)
span.set_attribute("session.agent_runs", 3)
Per-User (Daily/Monthly)
Track in your observability platform, not in spans.
Budget Alerting
Set up alerts for:
BUDGET_THRESHOLDS = {
"per_call_max": 1.00, # Alert if single call > $1
"per_session_max": 10.00, # Alert if session > $10
"per_user_daily": 50.00, # Alert if user > $50/day
"org_daily": 1000.00, # Alert if org > $1000/day
}
def check_budget(cost: float, level: str, entity_id: str):
threshold = BUDGET_THRESHOLDS.get(f"{level}_max")
if threshold and cost > threshold:
log_budget_alert(level, entity_id, cost, threshold)
Framework Integration
Langfuse
from langfuse import Langfuse
langfuse = Langfuse()
# Automatic token/cost tracking
trace = langfuse.trace(name="agent_run")
generation = trace.generation(
name="llm_call",
model="claude-3-5-sonnet",
usage={
"input": 1500,
"output": 350,
"unit": "TOKENS"
}
)
# Langfuse calculates cost automatically
LangSmith
from langsmith import Client
client = Client()
# Token tracking automatic via callbacks
# Cost calculation in LangSmith dashboard
OpenTelemetry
from opentelemetry import trace
tracer = trace.get_tracer("agent")
with tracer.start_as_current_span("llm_call") as span:
span.set_attribute("llm.tokens.input", 1500)
span.set_attribute("llm.tokens.output", 350)
span.set_attribute("llm.cost_usd", calculate_cost(...))
Caching Impact
Track cache hits for accurate costs:
span.set_attribute("llm.cache.hit", True)
span.set_attribute("llm.cache.tokens_saved", 1200)
span.set_attribute("llm.cache.cost_saved_usd", 0.018)
Optimization Signals
Track metrics that indicate optimization opportunities:
# Prompt efficiency
span.set_attribute("prompt.compression_ratio", 0.7)
span.set_attribute("prompt.could_use_haiku", True)
# Model selection
span.set_attribute("model.recommendation", "could_downgrade")
span.set_attribute("model.quality_requirement", "low")
Anti-Patterns
- Not tracking tokens (cost blindness)
- Missing cost attribution (can't optimize)
- Hardcoded pricing (becomes stale)
- No budget alerts (surprise bills)
- Tracking at wrong granularity
Related Skills
llm-call-tracing- LLM instrumentationsession-conversation-tracking- Session aggregation