name	otel-monitoring-setup
version	0.2.0
description	Automated OpenTelemetry setup for Claude Code with local PoC mode (full Docker stack with Grafana dashboards) and enterprise mode (connect to centralized infrastructure). Configures telemetry collection, verifies data flow, handles dashboard imports with datasource UID detection, and supports team rollout scenarios. Use for any OpenTelemetry setup task - local development, enterprise deployment, or team aggregation.
author	Connor

OpenTelemetry Monitoring Setup Skill

Automated workflow for setting up OpenTelemetry telemetry collection for Claude Code usage monitoring, cost tracking, and productivity analytics.

Supported Modes:

Mode 1: Local PoC Setup - Full Docker stack (OTEL Collector, Prometheus, Loki, Grafana)
Mode 2: Enterprise Setup - Connect to existing centralized infrastructure

Changelog: Track updates and fixes in this skill Modes Reference: See modes/ directory for detailed workflows

Response Style

Be systematic - Follow mode workflow exactly
Verify prerequisites - Check Docker, permissions, existing setup
Fix issues proactively - Handle OTEL Collector config issues, dashboard UID mismatches
Test thoroughly - Verify data flow, Prometheus queries, Grafana connectivity
Provide clear output - Show URLs, credentials, next steps

Quick Decision Matrix

User Request	Mode	Action
"Set up telemetry locally"	Mode 1	Full PoC stack setup
"I want to try OpenTelemetry"	Mode 1	Full PoC stack setup
"Connect to company OTEL endpoint"	Mode 2	Enterprise config only
"Set up for team rollout"	Mode 2	Enterprise + docs
"Dashboard not working"	Troubleshoot	Fix datasource UID

Mode 1: Local PoC Setup (Full Stack)

Goal: Create complete local telemetry stack for individual developer

When to use:

Developer wants to try telemetry locally
PoC/evaluation phase
No centralized infrastructure exists yet
Want to see full Grafana dashboards immediately

Prerequisites:

Docker Desktop installed and running
Claude Code installed
Write access to ~/.claude/settings.json
Minimum 2GB free disk space (for Docker images and data volumes)

High-Level Process:

Verify Docker is running
Create telemetry directory structure
Generate configuration files (docker-compose, OTEL Collector, Prometheus)
Start Docker containers
Update Claude Code settings.json
Import Grafana dashboards (with UID detection)
Verify data flow
Provide quickstart guide

Full Workflow: See modes/mode1-poc-setup.md

Estimated Time: 5-7 minutes

Output:

Running Docker containers (OTEL Collector, Prometheus, Loki, Grafana)
Grafana dashboards at http://localhost:3000 (admin/admin)
Claude Code sending telemetry data
Verification tests passed

Mode 2: Enterprise Setup (Connect to Existing)

Goal: Configure Claude Code to send telemetry to centralized company infrastructure

When to use:

Company has centralized OTEL Collector endpoint
Team rollout scenario
Want aggregated team metrics
Privacy/compliance requires centralized control

Prerequisites:

OTEL Collector endpoint URL (e.g., https://otel.company.com:4317)
Authentication credentials (API key or mTLS certificates)
Optional: Team/department identifiers

High-Level Process:

Collect endpoint information from user
Update Claude Code settings.json with OTEL endpoint
Add authentication headers if required
Add custom resource attributes (team, environment)
Test connectivity
Provide team rollout documentation

Full Workflow: See modes/mode2-enterprise.md

Estimated Time: 2-3 minutes

Output:

Claude Code configured to send to central endpoint
Connectivity verified
Team rollout documentation provided

Mode Detection Logic

Automatic detection based on user request:

IF user says "locally" OR "on my machine" OR "PoC" OR "try it out"
  → Mode 1 (Local PoC Setup)

ELSE IF user mentions "company endpoint" OR "centralized" OR "team" OR "enterprise"
  → Mode 2 (Enterprise Setup)

ELSE IF unsure
  → Ask: "Do you want to set up locally (Mode 1) or connect to existing infrastructure (Mode 2)?"

Priority:

Explicit user specification (highest)
Keyword detection
Ask user if ambiguous

Core Responsibilities

1. Docker Environment Verification

Check Docker installation and running status
Verify Docker Compose availability
Ensure sufficient disk space
Handle Docker Desktop not running error

2. Configuration File Generation

Create docker-compose.yml with correct services
Generate OTEL Collector config with debug exporter (not deprecated logging)
Set up Prometheus scrape configuration
Configure Grafana datasources
Create start/stop scripts

3. Claude Code Settings Management

Read existing ~/.claude/settings.json
Merge telemetry environment variables
Handle different configuration scenarios (local vs enterprise)
Preserve existing settings

4. Container Management

Start Docker containers in correct order
Wait for services to become healthy
Handle port conflicts
Monitor container status

5. Grafana Dashboard Import

Detect Grafana Prometheus datasource UID
Fix dashboard JSON with correct UID
Handle metric name variations (double prefix issue)
Import dashboards via API or provide instructions

6. Data Flow Verification

Test OTEL Collector connectivity
Query Prometheus for claude_code metrics
Verify Grafana datasource configuration
Run sample queries

7. Troubleshooting & Error Handling

Fix deprecated OTEL Collector exporters
Handle datasource UID mismatches
Resolve metric naming issues
Guide through common errors

Known Issues & Fixes

Issue 1: OTEL Collector Deprecated Exporter

Problem: OTEL Collector fails with "logging exporter has been deprecated"

Fix: Use debug exporter instead:

exporters:
  debug:
    verbosity: normal

Issue 2: Dashboard Datasource Not Found

Problem: Grafana dashboard shows "datasource prometheus not found"

Fix:

Detect actual Prometheus datasource UID
Replace all "uid": "prometheus" with detected UID
Re-import dashboard

Issue 3: Metric Names Double Prefix

Problem: Queries fail because metrics have format claude_code_claude_code_*

Fix: Update all dashboard queries to use correct naming:

claude_code_claude_code_lines_of_code_count_total
claude_code_claude_code_cost_usage_USD_total
claude_code_claude_code_token_usage_tokens_total
claude_code_claude_code_active_time_seconds_total
claude_code_claude_code_commit_count_total

Issue 4: No Data in Prometheus

Solutions:

Verify OTEL Collector is running and healthy
Check Claude Code was restarted after settings.json update
Confirm OTEL_EXPORTER_OTLP_ENDPOINT is correct
Check OTEL Collector logs for incoming data

Safety Protocols

Pre-Setup Checks

✅ Backup ~/.claude/settings.json before modification
✅ Check for existing telemetry setup (avoid conflicts)
✅ Verify Docker is running before starting containers
✅ Check ports 3000, 4317, 4318, 8889, 9090 are available

During Setup

✅ Create files with proper permissions
✅ Use docker compose (not deprecated docker-compose)
✅ Wait for container health checks
✅ Validate configuration files before use

Post-Setup

✅ Test data flow before declaring success
✅ Provide cleanup instructions
✅ Document URLs and credentials
✅ Offer to create quickstart guide

Example Interactions

Example 1: Local PoC Setup Request

User: "I want to set up telemetry locally to see my Claude Code usage"

Assistant: "I'll set up a complete local telemetry stack for you using Mode 1. This will create:

OpenTelemetry Collector (receives data from Claude Code)
Prometheus (stores metrics)
Loki (stores logs)
Grafana (visualization dashboards)

Let me verify prerequisites first..."

[Checks Docker status, disk space]

Assistant: "✅ Docker is running. Creating telemetry stack in ~/.claude/telemetry/..."

[Creates directory structure, generates config files, starts containers]

Assistant: "✅ All containers running! Now updating your Claude Code settings..."

[Updates settings.json with telemetry config]

Assistant: "✅ Settings updated. Now let me import Grafana dashboards..."

[Detects datasource UID, fixes dashboard JSON, imports]

Assistant: "✅ Dashboard imported! Let me verify data is flowing..."

[Runs test queries]

Assistant: "✅ Setup complete!

Access Grafana: http://localhost:3000 (admin/admin)

What's being collected:

Session counts and active time
Token usage (input/output/cached)
API costs by model
Lines of code modified
Commits and PRs created

Next: Use Claude Code normally - data will appear in ~60 seconds. Check your dashboard!

Management:

Start: ~/.claude/telemetry/start-telemetry.sh
Stop: ~/.claude/telemetry/stop-telemetry.sh"

Example 2: Enterprise Setup Request

User: "I need to connect to our company's OTEL endpoint at otel.company.com:4317"

Assistant: "I'll configure Claude Code to send telemetry to your centralized infrastructure using Mode 2. I need a few details:

Endpoint: otel.company.com:4317 ✓
Protocol: HTTPS or HTTP?
Authentication: Do you have an API key or certificate?
Team identifier: What team/department should metrics be tagged with?"

User: "HTTPS, we have an API key, tag it with team=platform"

Assistant: "Perfect! Let me configure your settings..."

[Updates settings.json with enterprise config]

Assistant: "✅ Configuration updated:

"env": {
  "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
  "OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
  "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer [your-key]",
  "OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production"
}

Important: You need to restart Claude Code for this to take effect.

To verify: After restart, check with your platform team that they see metrics tagged with team=platform."

Example 3: Dashboard Not Working

User: "My Grafana dashboard shows 'datasource prometheus not found'"

Assistant: "This is a common issue - the dashboard has a hardcoded datasource UID that doesn't match yours. Let me fix it:

First, I'll detect your actual Prometheus datasource UID..."

[Queries Grafana API or asks user to check]

Assistant: "Found it: Your datasource UID is PBFA97CFB590B2093

Now I'll update the dashboard JSON with the correct UID..."

[Reads dashboard JSON, replaces all UID references, saves new version]

Assistant: "✅ Fixed dashboard saved to: ~/.claude/telemetry/dashboards/claude-code-overview-corrected.json

To import:

Go to Grafana: http://localhost:3000
Dashboards → New → Import
Upload: claude-code-overview-corrected.json
Click Import

This should work now!"

Example 4: No Metrics Showing

User: "Dashboard imported but shows 'No data'"

Assistant: "Let me troubleshoot data flow:

Check 1: Are containers running?"

[Runs docker ps check]

Assistant: "✅ All containers running

Check 2: Is OTEL Collector receiving data?"

[Checks OTEL Collector logs]

Assistant: "✅ OTEL Collector is receiving telemetry

Check 3: Let me query Prometheus directly..."

[Runs curl to Prometheus API]

Assistant: "Found the issue! Your metrics use a double prefix: claude_code_claude_code_* instead of claude_code_*

The dashboard queries need to be updated. Let me fix that..."

[Creates corrected dashboard with proper metric names]

Assistant: "✅ Fixed! Import this new dashboard: claude-code-overview-fixed-v2.json

It uses the correct metric names:

claude_code_claude_code_lines_of_code_count_total
claude_code_claude_code_cost_usage_USD_total
etc.

Data should appear immediately after import!"

Integration Points

With skill-creator

This skill can be invoked during development to set up telemetry for skill testing
Useful for monitoring skill execution costs

With Other Skills

Can monitor execution time and cost of other skills
Helps identify expensive operations
Provides productivity metrics for skill effectiveness

With MCP Servers

Can integrate with monitoring MCP servers if available
Complements existing observability tools

Success Metrics

Mode 1 (Local PoC) Success Criteria:

✅ All 4 containers running (OTEL, Prometheus, Loki, Grafana)
✅ Grafana accessible at localhost:3000
✅ Dashboard imported successfully
✅ At least one Claude Code metric visible in Prometheus
✅ Dashboard shows data (even if just current session)
✅ User can access quickstart documentation

Mode 2 (Enterprise) Success Criteria:

✅ settings.json updated with correct endpoint
✅ Authentication configured
✅ Resource attributes set (team, environment)
✅ Connectivity test passed (if possible)
✅ Team rollout documentation provided

Important Reminders

Always backup settings.json before modifications - Data loss prevention
Always use debug exporter not logging exporter - OTEL Collector compatibility
Always detect datasource UID before importing dashboards - Prevents "datasource not found" errors
Always verify metric names - Handle double prefix issue (claude_code_claude_code_*)
Always restart Claude Code after settings changes - Telemetry only loads at startup
Always test data flow before declaring success - Ensure metrics are actually flowing
Never skip Docker verification - Prevents confusing errors downstream
Never guess metric names - Query Prometheus API to verify actual naming
Never leave broken dashboards - Always provide working dashboard version
Always provide management scripts - Users need start/stop capabilities

Resources

Mode 1 Workflow: modes/mode1-poc-setup.md - Detailed local setup process
Mode 2 Workflow: modes/mode2-enterprise.md - Enterprise configuration steps
Templates: templates/ directory - All configuration file templates
Dashboards: dashboards/ directory - Grafana dashboard templates
Metrics Reference: data/metrics-reference.md - Official Claude Code metrics documentation
Troubleshooting: data/troubleshooting.md - Common issues and solutions
Enterprise Guide: data/enterprise-config-guide.md - Team rollout and aggregation
PromQL Queries: data/prometheus-queries.md - Useful monitoring queries

Ready to set up OpenTelemetry for Claude Code! 🚀

otel-monitoring-setup

Install Skill

SKILL.md