| name | cost-optimization |
| description | Manage Claude Code API costs - token strategies, model selection, monitoring. Use when concerned about API spend, optimizing token usage, choosing models for tasks, or setting up cost monitoring. Covers /cost command, batching strategies, and budget management. |
| version | 1.0.0 |
| author | Claude Code SDK |
| tags | cost, optimization, tokens, budget |
Cost Optimization
Reduce Claude Code API costs while maintaining quality through smart token management, model selection, and monitoring.
Quick Reference
| Strategy | Impact | Effort |
|---|---|---|
| Use Haiku for simple tasks | High | Low |
| Batch related operations | Medium | Low |
| Use /compact strategically | Medium | Low |
| Reduce context size | High | Medium |
| Efficient prompting | Medium | Medium |
Understanding Costs
How API Costs Work
Claude Code costs are based on tokens:
- Input tokens: Everything Claude reads (prompts, files, context)
- Output tokens: Everything Claude generates (responses, code)
- Cached tokens: Reduced rate for repeated context
Model Pricing (Relative)
| Model | Input Cost | Output Cost | Best For |
|---|---|---|---|
| Haiku | $ | $ | Simple tasks, exploration |
| Sonnet | $$ | $$ | General development (default) |
| Opus | $$$$$ | $$$$$ | Complex reasoning, architecture |
Rule of thumb: Opus is ~15x more expensive than Haiku for the same tokens.
What Consumes Tokens
| Activity | Token Impact | Optimization |
|---|---|---|
| Reading files | High | Read selectively, use grep |
| Long conversations | Cumulative | Use /compact regularly |
| Tool outputs | Variable | Request summaries |
| Code generation | Medium | Be specific in requests |
| Error messages | Low | N/A |
The /cost Command
Basic Usage
> /cost
Shows:
- Session token usage (input/output)
- Estimated cost for current session
- Context window usage percentage
When to Check
- Before starting large tasks
- After reading multiple files
- When responses slow down
- Every 15-20 exchanges
- Before deciding to /compact vs /clear
Interpreting Results
| Metric | Good | Concern | Action |
|---|---|---|---|
| Context usage | <50% | >70% | Consider /compact |
| Session cost | Varies | Unexpected spike | Review recent operations |
| Output ratio | Balanced | Output >> Input | Responses too verbose |
Token Reduction Strategies
1. Selective File Reading
Expensive:
> Read the entire src/ directory to understand the codebase
Efficient:
> @src/api/users.ts @src/types/user.ts - I need to modify the user API
2. Use Grep Before Read
Expensive:
> Find all files that use the AuthService class
[Claude reads many files to find them]
Efficient:
> grep for "AuthService" in src/, then I'll look at the most relevant ones
3. Targeted @ Mentions
| Pattern | Token Cost | Use Case |
|---|---|---|
@src/ |
Very High | Avoid unless necessary |
@src/api/ |
High | When exploring a module |
@src/api/users.ts |
Low | Specific file work |
@src/api/users.ts:50-100 |
Very Low | Specific section |
4. Limit Output Verbosity
> Analyze this file and give me a brief summary of the key functions
vs
> Explain every line of this file
5. Batch Related Operations
Expensive (multiple turns):
> Read file A
> Now modify line 10
> Now read file B
> Modify line 20
Efficient (single turn):
> In file A, update the getUserById function to handle null.
> In file B, add the new UserNotFound error type.
> Run the tests after both changes.
/compact vs /clear
When to /compact
Use /compact when:
- Context is 70%+ full
- You want to continue the same task
- Need to preserve decisions and progress
- Responses are slowing down
Cost impact: Reduces ongoing costs by 50-80%
When to /clear
Use /clear when:
- Switching to unrelated task
- Previous context is irrelevant
- Starting fresh approach
- Maximum cost savings needed
Cost impact: Resets to zero (but loses all context)
Decision Matrix
| Situation | Command | Reasoning |
|---|---|---|
| Same task, full context | /compact | Preserve progress |
| Different project | /clear | Irrelevant context |
| Stuck on approach | /clear | Fresh perspective |
| After major milestone | /compact | Keep decisions |
| Testing something new | /clear | Clean state |
Model Selection
Quick Guide
| Task Type | Recommended Model | Why |
|---|---|---|
| File exploration | Haiku | Fast, cheap, sufficient |
| Simple edits | Haiku | Straightforward |
| General coding | Sonnet | Balanced (default) |
| Bug fixing | Sonnet | Needs reasoning |
| Architecture design | Opus | Deep analysis |
| Security review | Opus | Critical thinking |
| Complex refactoring | Opus | Multi-file reasoning |
Switching Models
Set model in skill frontmatter:
---
model: haiku
---
Or request model in prompt:
> Using Haiku, list all TypeScript files in src/
Cost Comparison Example
Task: Review 10 files for security issues
| Approach | Estimated Cost |
|---|---|
| Opus reviews all | $$$$$ |
| Haiku scans, Opus reviews flagged | $$ |
| Sonnet reviews all | $$$ |
Best strategy: Use Haiku for initial scan, escalate to Opus for detailed review of potential issues.
Efficient Prompting
Reduce Token Count
| Verbose | Concise | Savings |
|---|---|---|
| "Could you please" | [Just ask] | 3-4 tokens |
| "I want you to" | [State task] | 4-5 tokens |
| Long explanations | Bullet points | 20-50% |
| Repeated context | @ mentions | Significant |
Be Specific
Token-heavy:
> I have this function that gets users from the database and I want
> to add some caching because it's being called too often and making
> the app slow. Can you help me figure out a good caching strategy?
Efficient:
> Add Redis caching to getUserById in @src/api/users.ts.
> TTL: 5 minutes. Invalidate on user update.
Use Checklists
> Implement user search:
> - [ ] Add search endpoint
> - [ ] Add debounced input
> - [ ] Handle empty results
> Run tests when done.
Clearer than long paragraph descriptions.
Batching Strategies
Batch Similar Operations
Instead of multiple turns:
> Add logging to function A
[response]
> Add logging to function B
[response]
> Add logging to function C
Single turn:
> Add consistent logging to functions A, B, and C in @src/utils.ts
> Use format: logger.info("[FunctionName] action", { params })
Batch Read-Modify Cycles
> Review @src/api/*.ts for missing error handling.
> Add try-catch with proper logging to any functions that need it.
> Summarize changes made.
When NOT to Batch
- Complex, interdependent changes
- When you need to verify each step
- Exploratory work
- Learning a new codebase
Budget Management
Setting Expectations
| Session Type | Typical Cost Range |
|---|---|
| Quick fix | $ |
| Feature implementation | $$-$$$ |
| Large refactor | $$$-$$$$ |
| Architecture session (Opus) | $$$$$ |
Cost Controls
- Monitor actively: Check /cost regularly
- Set mental limits: "I'll compact at $X"
- Use appropriate models: Haiku for exploration
- Plan sessions: Know scope before starting
Daily/Weekly Tracking
> /cost
[Note the total]
Track across sessions to understand your patterns.
Subagent Cost Efficiency
Why Subagents Help
Subagents have isolated context:
- Main context stays lean
- Exploratory work doesn't pollute
- Can use cheaper models
Cost-Efficient Agent Pattern
---
name: explorer
model: haiku
tools: Read, Glob, Grep
---
Explore and summarize. Return only key findings.
Delegation Examples
| Task | Agent Model | Return |
|---|---|---|
| Find all API routes | Haiku | Route list |
| Analyze dependencies | Haiku | Summary |
| Review for patterns | Sonnet | Findings |
| Deep security review | Opus | Detailed report |
Common Wasteful Patterns
| Pattern | Why Wasteful | Better Approach |
|---|---|---|
| Reading entire directories | Massive token cost | Grep first, read specific |
| Verbose explanations | Unnecessary output | Request concise |
| Repeating context | Already in history | Use @ mentions |
| Not using /compact | Growing costs | Compact at 70% |
| Opus for everything | Expensive overkill | Match model to task |
| Long debugging sessions | Cumulative cost | Clear and restart |
Reference Files
| File | Contents |
|---|---|
| TOKEN-STRATEGIES.md | Detailed token reduction techniques |
| MODEL-SELECTION.md | Model comparison and selection guide |
| MONITORING.md | Cost tracking and budget management |
Quick Decisions
| Situation | Action |
|---|---|
| Context at 70% | /compact |
| Simple file exploration | Use Haiku |
| Need deep analysis | Use Opus (worth the cost) |
| Unexpected high cost | Check recent operations |
| Switching tasks | /clear to save costs |
| Debugging loop | Clear and try fresh approach |