Claude Code Plugins

Community-maintained marketplace

Feedback
1
0

Manage Claude Code API costs - token strategies, model selection, monitoring. Use when concerned about API spend, optimizing token usage, choosing models for tasks, or setting up cost monitoring. Covers /cost command, batching strategies, and budget management.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name cost-optimization
description Manage Claude Code API costs - token strategies, model selection, monitoring. Use when concerned about API spend, optimizing token usage, choosing models for tasks, or setting up cost monitoring. Covers /cost command, batching strategies, and budget management.
version 1.0.0
author Claude Code SDK
tags cost, optimization, tokens, budget

Cost Optimization

Reduce Claude Code API costs while maintaining quality through smart token management, model selection, and monitoring.

Quick Reference

Strategy Impact Effort
Use Haiku for simple tasks High Low
Batch related operations Medium Low
Use /compact strategically Medium Low
Reduce context size High Medium
Efficient prompting Medium Medium

Understanding Costs

How API Costs Work

Claude Code costs are based on tokens:

  • Input tokens: Everything Claude reads (prompts, files, context)
  • Output tokens: Everything Claude generates (responses, code)
  • Cached tokens: Reduced rate for repeated context

Model Pricing (Relative)

Model Input Cost Output Cost Best For
Haiku $ $ Simple tasks, exploration
Sonnet $$ $$ General development (default)
Opus $$$$$ $$$$$ Complex reasoning, architecture

Rule of thumb: Opus is ~15x more expensive than Haiku for the same tokens.

What Consumes Tokens

Activity Token Impact Optimization
Reading files High Read selectively, use grep
Long conversations Cumulative Use /compact regularly
Tool outputs Variable Request summaries
Code generation Medium Be specific in requests
Error messages Low N/A

The /cost Command

Basic Usage

> /cost

Shows:

  • Session token usage (input/output)
  • Estimated cost for current session
  • Context window usage percentage

When to Check

  • Before starting large tasks
  • After reading multiple files
  • When responses slow down
  • Every 15-20 exchanges
  • Before deciding to /compact vs /clear

Interpreting Results

Metric Good Concern Action
Context usage <50% >70% Consider /compact
Session cost Varies Unexpected spike Review recent operations
Output ratio Balanced Output >> Input Responses too verbose

Token Reduction Strategies

1. Selective File Reading

Expensive:

> Read the entire src/ directory to understand the codebase

Efficient:

> @src/api/users.ts @src/types/user.ts - I need to modify the user API

2. Use Grep Before Read

Expensive:

> Find all files that use the AuthService class
[Claude reads many files to find them]

Efficient:

> grep for "AuthService" in src/, then I'll look at the most relevant ones

3. Targeted @ Mentions

Pattern Token Cost Use Case
@src/ Very High Avoid unless necessary
@src/api/ High When exploring a module
@src/api/users.ts Low Specific file work
@src/api/users.ts:50-100 Very Low Specific section

4. Limit Output Verbosity

> Analyze this file and give me a brief summary of the key functions

vs

> Explain every line of this file

5. Batch Related Operations

Expensive (multiple turns):

> Read file A
> Now modify line 10
> Now read file B
> Modify line 20

Efficient (single turn):

> In file A, update the getUserById function to handle null.
> In file B, add the new UserNotFound error type.
> Run the tests after both changes.

/compact vs /clear

When to /compact

Use /compact when:

  • Context is 70%+ full
  • You want to continue the same task
  • Need to preserve decisions and progress
  • Responses are slowing down

Cost impact: Reduces ongoing costs by 50-80%

When to /clear

Use /clear when:

  • Switching to unrelated task
  • Previous context is irrelevant
  • Starting fresh approach
  • Maximum cost savings needed

Cost impact: Resets to zero (but loses all context)

Decision Matrix

Situation Command Reasoning
Same task, full context /compact Preserve progress
Different project /clear Irrelevant context
Stuck on approach /clear Fresh perspective
After major milestone /compact Keep decisions
Testing something new /clear Clean state

Model Selection

Quick Guide

Task Type Recommended Model Why
File exploration Haiku Fast, cheap, sufficient
Simple edits Haiku Straightforward
General coding Sonnet Balanced (default)
Bug fixing Sonnet Needs reasoning
Architecture design Opus Deep analysis
Security review Opus Critical thinking
Complex refactoring Opus Multi-file reasoning

Switching Models

Set model in skill frontmatter:

---
model: haiku
---

Or request model in prompt:

> Using Haiku, list all TypeScript files in src/

Cost Comparison Example

Task: Review 10 files for security issues

Approach Estimated Cost
Opus reviews all $$$$$
Haiku scans, Opus reviews flagged $$
Sonnet reviews all $$$

Best strategy: Use Haiku for initial scan, escalate to Opus for detailed review of potential issues.

Efficient Prompting

Reduce Token Count

Verbose Concise Savings
"Could you please" [Just ask] 3-4 tokens
"I want you to" [State task] 4-5 tokens
Long explanations Bullet points 20-50%
Repeated context @ mentions Significant

Be Specific

Token-heavy:

> I have this function that gets users from the database and I want
> to add some caching because it's being called too often and making
> the app slow. Can you help me figure out a good caching strategy?

Efficient:

> Add Redis caching to getUserById in @src/api/users.ts.
> TTL: 5 minutes. Invalidate on user update.

Use Checklists

> Implement user search:
> - [ ] Add search endpoint
> - [ ] Add debounced input
> - [ ] Handle empty results
> Run tests when done.

Clearer than long paragraph descriptions.

Batching Strategies

Batch Similar Operations

Instead of multiple turns:

> Add logging to function A
[response]
> Add logging to function B
[response]
> Add logging to function C

Single turn:

> Add consistent logging to functions A, B, and C in @src/utils.ts
> Use format: logger.info("[FunctionName] action", { params })

Batch Read-Modify Cycles

> Review @src/api/*.ts for missing error handling.
> Add try-catch with proper logging to any functions that need it.
> Summarize changes made.

When NOT to Batch

  • Complex, interdependent changes
  • When you need to verify each step
  • Exploratory work
  • Learning a new codebase

Budget Management

Setting Expectations

Session Type Typical Cost Range
Quick fix $
Feature implementation $$-$$$
Large refactor $$$-$$$$
Architecture session (Opus) $$$$$

Cost Controls

  1. Monitor actively: Check /cost regularly
  2. Set mental limits: "I'll compact at $X"
  3. Use appropriate models: Haiku for exploration
  4. Plan sessions: Know scope before starting

Daily/Weekly Tracking

> /cost
[Note the total]

Track across sessions to understand your patterns.

Subagent Cost Efficiency

Why Subagents Help

Subagents have isolated context:

  • Main context stays lean
  • Exploratory work doesn't pollute
  • Can use cheaper models

Cost-Efficient Agent Pattern

---
name: explorer
model: haiku
tools: Read, Glob, Grep
---
Explore and summarize. Return only key findings.

Delegation Examples

Task Agent Model Return
Find all API routes Haiku Route list
Analyze dependencies Haiku Summary
Review for patterns Sonnet Findings
Deep security review Opus Detailed report

Common Wasteful Patterns

Pattern Why Wasteful Better Approach
Reading entire directories Massive token cost Grep first, read specific
Verbose explanations Unnecessary output Request concise
Repeating context Already in history Use @ mentions
Not using /compact Growing costs Compact at 70%
Opus for everything Expensive overkill Match model to task
Long debugging sessions Cumulative cost Clear and restart

Reference Files

File Contents
TOKEN-STRATEGIES.md Detailed token reduction techniques
MODEL-SELECTION.md Model comparison and selection guide
MONITORING.md Cost tracking and budget management

Quick Decisions

Situation Action
Context at 70% /compact
Simple file exploration Use Haiku
Need deep analysis Use Opus (worth the cost)
Unexpected high cost Check recent operations
Switching tasks /clear to save costs
Debugging loop Clear and try fresh approach