Claude Code Plugins

Community-maintained marketplace

Feedback

LLM cost tracking with Langfuse for cached responses. Use when monitoring cache effectiveness, tracking cost savings, or attributing costs to agents in multi-agent systems.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name cache-cost-tracking
description LLM cost tracking with Langfuse for cached responses. Use when monitoring cache effectiveness, tracking cost savings, or attributing costs to agents in multi-agent systems.
context fork
agent metrics-architect

Cache Cost Tracking

Monitor LLM costs and cache effectiveness.

When to Use

  • Cost attribution by agent
  • Cache hit rate monitoring
  • ROI analysis for caching
  • Multi-agent cost rollup

Langfuse Automatic Tracking

from langfuse.decorators import observe, langfuse_context

@observe(as_type="generation")
async def call_llm_with_cache(
    prompt: str,
    agent_type: str,
    analysis_id: UUID
) -> str:
    """LLM call with automatic cost tracking."""

    # Link to parent trace
    langfuse_context.update_current_trace(
        name=f"{agent_type}_generation",
        session_id=str(analysis_id)
    )

    # Check caches
    if cache_key in lru_cache:
        langfuse_context.update_current_observation(
            metadata={"cache_layer": "L1", "cache_hit": True}
        )
        return lru_cache[cache_key]

    similar = await semantic_cache.get(prompt, agent_type)
    if similar:
        langfuse_context.update_current_observation(
            metadata={"cache_layer": "L2", "cache_hit": True}
        )
        return similar

    # LLM call - Langfuse tracks tokens/cost automatically
    response = await llm.generate(prompt)

    langfuse_context.update_current_observation(
        metadata={
            "cache_layer": "L4",
            "cache_hit": False,
            "prompt_cache_hit": response.usage.cache_read_input_tokens > 0
        }
    )

    return response.content

Hierarchical Cost Rollup

class AnalysisWorkflow:
    @observe(as_type="trace")
    async def run_analysis(self, url: str, analysis_id: UUID):
        """Parent trace aggregates child costs.

        Trace Hierarchy:
        run_analysis (trace)
        ├── security_agent (generation)
        ├── tech_agent (generation)
        └── synthesis (generation)
        """
        langfuse_context.update_current_trace(
            name="content_analysis",
            session_id=str(analysis_id),
            tags=["multi-agent"]
        )

        for agent in self.agents:
            await self.run_agent(agent, content, analysis_id)

    @observe(as_type="generation")
    async def run_agent(self, agent, content, analysis_id):
        """Child generation - costs roll up to parent."""
        langfuse_context.update_current_observation(
            name=f"{agent.name}_generation",
            metadata={"agent_type": agent.name}
        )
        return await agent.analyze(content)

Cost Queries

from langfuse import Langfuse

async def get_analysis_costs(analysis_id: UUID) -> dict:
    langfuse = Langfuse()

    traces = langfuse.get_traces(session_id=str(analysis_id), limit=1)

    if traces.data:
        trace = traces.data[0]
        return {
            "total_cost": trace.total_cost,
            "input_tokens": trace.usage.input_tokens,
            "output_tokens": trace.usage.output_tokens,
            "cache_read_tokens": trace.usage.cache_read_input_tokens,
        }

async def get_costs_by_agent() -> list[dict]:
    generations = langfuse.get_generations(
        from_timestamp=datetime.now() - timedelta(days=7),
        limit=1000
    )

    costs = {}
    for gen in generations.data:
        agent = gen.metadata.get("agent_type", "unknown")
        if agent not in costs:
            costs[agent] = {"total": 0, "calls": 0, "cache_hits": 0}

        costs[agent]["total"] += gen.calculated_total_cost or 0
        costs[agent]["calls"] += 1
        if gen.metadata.get("cache_hit"):
            costs[agent]["cache_hits"] += 1

    return list(costs.values())

Cache Effectiveness

cache_hits = 0
cache_misses = 0
cost_saved = 0.0

for gen in generations:
    if gen.metadata.get("cache_hit"):
        cache_hits += 1
        cost_saved += estimate_full_cost(gen)
    else:
        cache_misses += 1

hit_rate = cache_hits / (cache_hits + cache_misses)
print(f"Cache Hit Rate: {hit_rate:.1%}")
print(f"Cost Saved: ${cost_saved:.2f}")

Key Decisions

Decision Recommendation
Trace grouping session_id = analysis_id
Cost attribution metadata.agent_type
Query window 7-30 days
Dashboard Langfuse web UI

Common Mistakes

  • Not linking child to parent trace
  • Missing metadata for attribution
  • Not tracking cache hits separately
  • Ignoring prompt cache savings

Related Skills

  • semantic-caching - Redis caching
  • prompt-caching - Provider caching
  • langfuse-observability - Full observability

Capability Details

prompt-caching

Keywords: prompt cache, cache prompt, prefix caching, cache breakpoints Solves:

  • Reduce token costs with cached prompts
  • Configure cache breakpoints
  • Implement provider-native caching

response-caching

Keywords: response cache, semantic cache, cache response, LLM cache Solves:

  • Cache LLM responses for repeated queries
  • Implement semantic similarity caching
  • Reduce API calls with cached responses

cost-calculation

Keywords: cost, token cost, calculate cost, pricing, usage cost Solves:

  • Calculate token costs by model
  • Track input/output token pricing
  • Estimate cost before execution

usage-tracking

Keywords: usage, track usage, token usage, API usage, metrics Solves:

  • Track LLM API usage over time
  • Monitor token consumption
  • Generate usage reports

cache-invalidation

Keywords: invalidate, cache invalidation, TTL, expire, refresh Solves:

  • Implement cache invalidation strategies
  • Configure TTL for cached responses
  • Handle stale cache entries