| name | pop-routine-measure |
| description | Display routine measurement dashboard with metrics, costs, trends, and visualization |
| invocation_pattern | /popkit:routine (morning|nightly) --measure|show routine measurements|routine performance|routine metrics dashboard |
| tier | 1 |
| version | 1.1.0 |
Routine Measurement Dashboard
Tracks, visualizes, and reports context window usage, execution duration, tool call breakdown, and cost estimates during routine execution.
When to Use
Primary Use Cases:
- Auto-Measurement: Invoked AUTOMATICALLY when user includes
--measureflag in/popkit:routinecommands - Dashboard Display: Invoked when user requests viewing existing measurements
- Trend Analysis: Invoked when comparing measurements across multiple runs
# Auto-measurement during routine execution
/popkit:routine morning --measure
/popkit:routine morning run p-1 --measure
/popkit:routine nightly --measure
# Viewing existing measurements
show routine measurements
show measurements for morning routine
routine performance dashboard
How It Works
- Detect Flag: Parse command for
--measureflag - Start Tracking: Enable measurement via environment variable
- Initialize Tracker: Start
RoutineMeasurementTracker - Execute Routine: Run the routine normally (pk, p-1, etc.)
- Stop Tracking: Collect measurement data
- Format Report: Display detailed breakdown
- Save Data: Store JSON for analysis
Implementation Pattern
import os
import sys
sys.path.insert(0, "packages/plugin/hooks/utils")
from routine_measurement import (
RoutineMeasurementTracker,
enable_measurement,
disable_measurement,
format_measurement_report,
save_measurement
)
# 1. Enable measurement mode
enable_measurement()
# 2. Start tracker
tracker = RoutineMeasurementTracker()
tracker.start(routine_id="p-1", routine_name="PopKit Full Validation")
# 3. Execute routine
# Use Skill tool to invoke the actual routine
# Example: Skill(skill="pop-morning-routine", args="--routine p-1")
# 4. Stop tracker and get measurement
measurement = tracker.stop()
# 5. Disable measurement mode
disable_measurement()
# 6. Display report
if measurement:
report = format_measurement_report(measurement)
print(report)
# Save measurement data
saved_path = save_measurement(measurement)
print(f"\nMeasurement data saved to: {saved_path}")
Tool Call Tracking
The post-tool-use.py hook automatically tracks tool calls when POPKIT_ROUTINE_MEASURE=true:
- Tracked Tools: All tools (Bash, Read, Grep, Write, Edit, Skill, etc.)
- Token Estimation: ~4 chars per token (rough approximation)
- Input/Output Split: 20% input, 80% output (heuristic)
- Duration: Captured from hook execution time
Output Format
======================================================================
Routine Measurement Report
======================================================================
Routine: PopKit Full Validation (p-1)
Duration: 12.34s
Tool Calls: 15
Context Usage:
Input Tokens: 1,234 (~1k)
Output Tokens: 6,789 (~6k)
Total Tokens: 8,023 (~8k)
Characters: 32,092
Cost Estimate (Claude Sonnet 4.5):
Input: $0.0037
Output: $0.1018
Total: $0.1055
Tool Breakdown:
----------------------------------------------------------------------
Tool Calls Tokens Duration Chars
----------------------------------------------------------------------
Bash 8 3,456 2.34s 13,824
Read 4 2,123 1.12s 8,492
Grep 2 1,234 0.56s 4,936
Skill 1 1,210 8.32s 4,840
======================================================================
Measurement Data Storage
Measurements are saved to .claude/popkit/measurements/ as JSON:
{
"routine_id": "p-1",
"routine_name": "PopKit Full Validation",
"start_time": 1734567890.123,
"end_time": 1734567902.456,
"duration": 12.333,
"total_tool_calls": 15,
"total_tokens": 8023,
"input_tokens": 1234,
"output_tokens": 6789,
"total_chars": 32092,
"tool_breakdown": {
"Bash": {
"count": 8,
"input_tokens": 691,
"output_tokens": 2765,
"duration": 2.34,
"chars": 13824
}
},
"cost_estimate": {
"input": 0.0037,
"output": 0.1018,
"total": 0.1055
}
}
Usage Examples
Measure Morning Routine (Default)
User: /popkit:routine morning --measure
Claude: I'll measure the context usage for your morning routine.
[Enables measurement and runs p-1 routine]
[Morning routine output displays normally]
======================================================================
Routine Measurement Report
======================================================================
Routine: PopKit Full Validation (p-1)
Duration: 12.34s
Tool Calls: 15
...
Measurement data saved to: .claude/popkit/measurements/p-1_20251219_143022.json
Measure Specific Routine
User: /popkit:routine morning run pk --measure
Claude: I'll measure the universal PopKit routine.
[Measurement report shows metrics for pk routine]
Compare Routines (Manual)
# Run each routine with --measure
/popkit:routine morning run pk --measure
/popkit:routine morning run p-1 --measure
# Compare JSON files
cat .claude/popkit/measurements/pk_*.json
cat .claude/popkit/measurements/p-1_*.json
Integration
Command Integration
The commands/routine.md documents the --measure flag. When Claude sees this flag:
- Invoke this skill before executing the routine
- Wrap execution with measurement tracking
- Display results after routine completion
Hook Integration
The post-tool-use.py hook checks for POPKIT_ROUTINE_MEASURE=true:
if ROUTINE_MEASUREMENT_AVAILABLE and check_measure_flag():
tracker = RoutineMeasurementTracker()
if tracker.is_active():
tracker.track_tool_call(tool_name, content, execution_time)
Storage Location
.claude/popkit/measurements/
├── pk_20251219_080000.json # Universal routine
├── p-1_20251219_143022.json # Custom routine
└── rc-1_20251219_180000.json # Project routine
Metrics Collected
| Metric | Description | Source |
|---|---|---|
| Duration | Total execution time in seconds | Tracker start/stop |
| Tool Calls | Number of tools invoked | Hook tracking |
| Input Tokens | Estimated input tokens (~20% of total) | Content length / 4 |
| Output Tokens | Estimated output tokens (~80% of total) | Content length / 4 |
| Total Tokens | Input + Output | Sum |
| Characters | Raw character count | Content length |
| Cost | Estimated API cost (Sonnet 4.5 pricing) | Token count * price |
Token Estimation
Uses rough heuristic: ~4 characters per token
This is an approximation. Actual tokenization varies by:
- Language (code vs natural language)
- Repetition and patterns
- Special characters
For more accurate counts, use Claude API's token counting endpoint (future enhancement).
Cost Calculation
Based on Claude Sonnet 4.5 pricing (as of Dec 2025):
- Input: $3.00 per million tokens
- Output: $15.00 per million tokens
Costs are estimates only - actual costs depend on caching, context reuse, and other factors.
Dashboard Visualization (NEW in v1.1.0)
When invoked to view existing measurements, this skill provides an interactive dashboard:
Implementation - View Dashboard
#!/usr/bin/env python3
"""
Routine Measurement Dashboard
Displays metrics, trends, and visualizations for routine measurements
"""
import json
import sys
from pathlib import Path
from datetime import datetime
# Add shared utilities
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / 'shared-py'))
from popkit_shared.utils.routine_measurement import estimate_tokens
def load_measurements(routine_name=None):
"""Load measurement files from disk."""
measurements_dir = Path.cwd() / ".claude" / "popkit" / "measurements"
if not measurements_dir.exists():
return []
measurements = []
for file_path in sorted(measurements_dir.glob("*.json"), reverse=True):
try:
with open(file_path) as f:
data = json.load(f)
# Filter by routine name if provided
if routine_name:
# Match exact routine_name or routine_id
if data.get("routine_name") != routine_name and data.get("routine_id") != routine_name:
continue
measurements.append({
"file": file_path.name,
"data": data,
"timestamp": datetime.fromtimestamp(data.get("start_time", 0))
})
except Exception as e:
print(f"Warning: Failed to load {file_path}: {e}", file=sys.stderr)
return measurements
def format_dashboard(measurement_data, previous_data=None):
"""Format a measurement dashboard."""
data = measurement_data["data"]
lines = []
lines.append("=" * 80)
lines.append("ROUTINE MEASUREMENT DASHBOARD".center(80))
lines.append("=" * 80)
# Header
routine_name = data.get('routine_name', 'Unknown')
routine_id = data.get('routine_id', 'N/A')
lines.append(f"Routine: {routine_name} ({routine_id})")
lines.append(f"Timestamp: {measurement_data['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
lines.append(f"File: {measurement_data['file']}")
lines.append("")
# Summary Metrics
lines.append("SUMMARY METRICS")
lines.append("-" * 80)
duration = data.get("duration", 0)
lines.append(f" Duration: {duration:.2f}s ({duration/60:.1f} minutes)")
lines.append(f" Tool Calls: {data.get('total_tool_calls', 0)}")
lines.append(f" Total Tokens: {data.get('total_tokens', 0):,}")
lines.append(f" Characters: {data.get('total_chars', 0):,}")
lines.append("")
# Token Breakdown
lines.append("TOKEN USAGE")
lines.append("-" * 80)
input_tokens = data.get("input_tokens", 0)
output_tokens = data.get("output_tokens", 0)
total_tokens = data.get("total_tokens", 0)
in_pct = (input_tokens/total_tokens*100) if total_tokens else 0
out_pct = (output_tokens/total_tokens*100) if total_tokens else 0
lines.append(f" Input Tokens: {input_tokens:>10,} ({input_tokens/1000:.1f}k) [{in_pct:.1f}%]")
lines.append(f" Output Tokens: {output_tokens:>10,} ({output_tokens/1000:.1f}k) [{out_pct:.1f}%]")
lines.append(f" Total Tokens: {total_tokens:>10,} ({total_tokens/1000:.1f}k)")
lines.append("")
# Cost Estimate
cost = data.get("cost_estimate", {})
lines.append("COST ESTIMATE (Claude Sonnet 4.5)")
lines.append("-" * 80)
lines.append(f" Input Cost: ${cost.get('input', 0):.4f} (@$3.00/million tokens)")
lines.append(f" Output Cost: ${cost.get('output', 0):.4f} (@$15.00/million tokens)")
lines.append(f" Total Cost: ${cost.get('total', 0):.4f}")
lines.append("")
# Tool Breakdown
lines.append("TOOL BREAKDOWN")
lines.append("-" * 80)
lines.append(f"{'Tool':<20} {'Calls':<8} {'Tokens':<12} {'Duration':<12} {'Chars':<12}")
lines.append("-" * 80)
tool_breakdown = data.get("tool_breakdown", {})
for tool, stats in tool_breakdown.items():
total_tool_tokens = stats.get("input_tokens", 0) + stats.get("output_tokens", 0)
lines.append(
f"{tool:<20} "
f"{stats.get('count', 0):<8} "
f"{total_tool_tokens:>10,} "
f"{stats.get('duration', 0):>10.2f}s "
f"{stats.get('chars', 0):>10,}"
)
# Comparison with previous run
if previous_data:
lines.append("")
lines.append("COMPARISON WITH PREVIOUS RUN")
lines.append("-" * 80)
prev = previous_data["data"]
# Duration change
prev_duration = prev.get("duration", 0)
duration_change = duration - prev_duration
duration_pct = (duration_change / prev_duration * 100) if prev_duration else 0
duration_indicator = "↑" if duration_change > 0 else "↓" if duration_change < 0 else "→"
lines.append(f" Duration: {duration_indicator} {abs(duration_change):.2f}s ({duration_pct:+.1f}%)")
# Token change
prev_tokens = prev.get("total_tokens", 0)
token_change = total_tokens - prev_tokens
token_pct = (token_change / prev_tokens * 100) if prev_tokens else 0
token_indicator = "↑" if token_change > 0 else "↓" if token_change < 0 else "→"
lines.append(f" Tokens: {token_indicator} {abs(token_change):,} ({token_pct:+.1f}%)")
# Cost change
prev_cost = prev.get("cost_estimate", {}).get("total", 0)
cost_change = cost.get("total", 0) - prev_cost
cost_pct = (cost_change / prev_cost * 100) if prev_cost else 0
cost_indicator = "↑" if cost_change > 0 else "↓" if cost_change < 0 else "→"
lines.append(f" Cost: {cost_indicator} ${abs(cost_change):.4f} ({cost_pct:+.1f}%)")
# Tool call change
prev_tool_calls = prev.get("total_tool_calls", 0)
tool_call_change = data.get("total_tool_calls", 0) - prev_tool_calls
tool_call_indicator = "↑" if tool_call_change > 0 else "↓" if tool_call_change < 0 else "→"
lines.append(f" Tool Calls: {tool_call_indicator} {abs(tool_call_change)} tools")
lines.append("=" * 80)
return "\n".join(lines)
def show_all_measurements(routine_name=None):
"""Show summary of all measurements."""
measurements = load_measurements(routine_name)
if not measurements:
if routine_name:
print(f"No measurements found for routine: {routine_name}")
else:
print("No measurements found.")
print("\nRun a routine with --measure flag to create measurements:")
print(" /popkit:routine morning --measure")
return
lines = []
lines.append("=" * 80)
if routine_name:
lines.append(f"ALL MEASUREMENTS: {routine_name}".center(80))
else:
lines.append("ALL ROUTINE MEASUREMENTS".center(80))
lines.append("=" * 80)
lines.append(f"Total Measurements: {len(measurements)}")
lines.append("")
lines.append(f"{'Date':<20} {'Routine':<15} {'Duration':<12} {'Tokens':<12} {'Cost':<10}")
lines.append("-" * 80)
for m in measurements:
data = m["data"]
routine_display = data.get('routine_id', 'unknown')[:14]
lines.append(
f"{m['timestamp'].strftime('%Y-%m-%d %H:%M:%S'):<20} "
f"{routine_display:<15} "
f"{data.get('duration', 0):>10.2f}s "
f"{data.get('total_tokens', 0):>10,} "
f"${data.get('cost_estimate', {}).get('total', 0):>8.4f}"
)
# Calculate trends if multiple measurements
if len(measurements) >= 2:
lines.append("")
lines.append("AGGREGATE STATISTICS")
lines.append("-" * 80)
# Averages
avg_duration = sum(m["data"].get("duration", 0) for m in measurements) / len(measurements)
avg_tokens = sum(m["data"].get("total_tokens", 0) for m in measurements) / len(measurements)
avg_cost = sum(m["data"].get("cost_estimate", {}).get("total", 0) for m in measurements) / len(measurements)
lines.append(f" Average Duration: {avg_duration:.2f}s")
lines.append(f" Average Tokens: {avg_tokens:,.0f}")
lines.append(f" Average Cost: ${avg_cost:.4f}")
# Totals
total_duration = sum(m["data"].get("duration", 0) for m in measurements)
total_tokens = sum(m["data"].get("total_tokens", 0) for m in measurements)
total_cost = sum(m["data"].get("cost_estimate", {}).get("total", 0) for m in measurements)
lines.append("")
lines.append(f" Total Duration: {total_duration:.2f}s ({total_duration/60:.1f} minutes)")
lines.append(f" Total Tokens: {total_tokens:,}")
lines.append(f" Total Cost: ${total_cost:.4f}")
# Trend (first vs last)
first = measurements[-1]["data"]
last = measurements[0]["data"]
duration_trend = last.get("duration", 0) - first.get("duration", 0)
token_trend = last.get("total_tokens", 0) - first.get("total_tokens", 0)
lines.append("")
lines.append("TREND (First → Latest)")
lines.append("-" * 80)
lines.append(f" Duration Change: {'+' if duration_trend > 0 else ''}{duration_trend:.2f}s")
lines.append(f" Token Change: {'+' if token_trend > 0 else ''}{token_trend:,}")
lines.append("=" * 80)
print("\n".join(lines))
def main():
"""Main entry point for dashboard."""
import argparse
parser = argparse.ArgumentParser(description="Routine measurement dashboard")
parser.add_argument("--routine", help="Filter by routine name/ID")
parser.add_argument("--all", action="store_true", help="Show all measurements summary")
parser.add_argument("--no-compare", action="store_true", help="Don't compare with previous run")
args = parser.parse_args()
# Load measurements
measurements = load_measurements(args.routine)
if not measurements:
if args.routine:
print(f"No measurements found for routine: {args.routine}")
else:
print("No measurements found.")
print("\nRun a routine with --measure flag to create measurements:")
print(" /popkit:routine morning --measure")
return
# Show all measurements summary
if args.all:
show_all_measurements(args.routine)
return
# Show latest measurement dashboard
latest = measurements[0]
previous = measurements[1] if len(measurements) > 1 and not args.no_compare else None
dashboard = format_dashboard(latest, previous)
print(dashboard)
# Hint
if len(measurements) > 1:
print("")
print(f"Tip: Use --all to see summary of all {len(measurements)} measurements")
if __name__ == "__main__":
main()
Usage - Dashboard Commands
# View latest measurement for any routine
python -c "$(cat <<'EOF'
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd() / 'packages/shared-py'))
# ... (dashboard code above) ...
main()
EOF
)"
# View measurements for specific routine
python <dashboard_script> --routine morning
# View all measurements summary
python <dashboard_script> --all
# View without comparison
python <dashboard_script> --routine morning --no-compare
Future Enhancements
Phase 2: Comparison Mode
/popkit:routine morning --measure --compare pk,p-1
Phase 3: Trend Analysis
/popkit:routine morning --measure --trend 7d
Phase 4: Optimization Suggestions
Tool breakdown shows Bash taking 60% of tokens.
Suggestion: Cache git status results to reduce redundant calls.
Related Skills
| Skill | Purpose |
|---|---|
pop-morning-routine |
Execute morning routine |
pop-nightly-routine |
Execute nightly routine |
pop-routine-generator |
Create custom routines |
pop-assessment-performance |
Analyze performance metrics |
Related Commands
| Command | Purpose |
|---|---|
/popkit:routine |
Execute routines |
/popkit:assess performance |
Performance assessment |
/popkit:stats |
Session statistics |
Architecture Files
| File | Purpose |
|---|---|
hooks/utils/routine_measurement.py |
Measurement tracking classes |
hooks/post-tool-use.py |
Tool call capture hook |
commands/routine.md |
Command specification |
.claude/popkit/measurements/ |
Measurement data storage |
Testing
Test measurement functionality:
# Enable measurement manually
export POPKIT_ROUTINE_MEASURE=true
# Run a routine
/popkit:routine morning
# Verify measurement file created
ls -la .claude/popkit/measurements/
# Inspect JSON
cat .claude/popkit/measurements/*.json | jq '.'
Version: 1.0.0 Author: PopKit Development Team Last Updated: 2025-12-19