Claude Code Plugins

Community-maintained marketplace

Feedback

golang-observability-opentelemetry

@mattnigh/skills_collection
0
0

Instrumenting Go applications with OpenTelemetry for distributed tracing, Prometheus for metrics, and structured logging with slog

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name golang-observability-opentelemetry
description Instrumenting Go applications with OpenTelemetry for distributed tracing, Prometheus for metrics, and structured logging with slog
version 1.0.0
category toolchain
author Claude MPM Team
license MIT
progressive_disclosure [object Object]
context_limit 700
tags observability, golang, opentelemetry, prometheus, tracing, metrics, logging, slog
requires_tools

Go Observability with OpenTelemetry

Overview

Modern Go applications require comprehensive observability through the three pillars: traces, metrics, and logs. OpenTelemetry provides vendor-neutral instrumentation for distributed tracing, Prometheus offers powerful metrics collection, and Go's slog package (1.21+) delivers structured logging with minimal overhead.

Key Features:

  • 🔍 OpenTelemetry: Distributed tracing with context propagation
  • 📊 Prometheus: Metrics collection with /metrics endpoint
  • 📝 Structured Logging: slog with JSON formatting and correlation IDs
  • 🎯 Auto-Instrumentation: HTTP/gRPC middleware patterns
  • 💚 Health Checks: Kubernetes-ready readiness/liveness probes
  • 🔄 Graceful Shutdown: Clean exporter shutdown and signal handling

When to Use This Skill

Activate this skill when:

  • Instrumenting microservices for production observability
  • Setting up distributed tracing across service boundaries
  • Creating operational dashboards with Prometheus/Grafana
  • Debugging production performance issues or bottlenecks
  • Implementing SLOs and monitoring SLIs
  • Adding observability to existing Go applications
  • Correlating logs, traces, and metrics for debugging

Core Observability Principles

The Three Pillars

  1. Traces: Understand request flow across distributed systems
  2. Metrics: Measure system behavior and performance over time
  3. Logs: Record discrete events for debugging and audit

Correlation Strategy

All three pillars must share common identifiers:

  • Trace ID: Links all operations in a request
  • Span ID: Identifies specific operation within trace
  • Request ID: Correlates logs with traces and metrics

OpenTelemetry Integration

Installation

go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/exporters/jaeger
go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp

Basic Setup

package main

import (
    "context"
    "log"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)

func initTracer(serviceName string) (*sdktrace.TracerProvider, error) {
    // Create Jaeger exporter
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://localhost:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    // Create resource with service name
    res, err := resource.Merge(
        resource.Default(),
        resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
            semconv.ServiceVersion("1.0.0"),
        ),
    )
    if err != nil {
        return nil, err
    }

    // Create tracer provider
    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.AlwaysSample()), // Use probability sampler in production
    )

    otel.SetTracerProvider(tp)
    return tp, nil
}

func main() {
    tp, err := initTracer("order-service")
    if err != nil {
        log.Fatal(err)
    }
    defer func() {
        if err := tp.Shutdown(context.Background()); err != nil {
            log.Printf("Error shutting down tracer: %v", err)
        }
    }()

    // Application code...
}

Creating Spans

import (
    "context"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/trace"
)

func ProcessOrder(ctx context.Context, order Order) error {
    tracer := otel.Tracer("order-service")
    ctx, span := tracer.Start(ctx, "ProcessOrder")
    defer span.End()

    // Add attributes
    span.SetAttributes(
        attribute.String("order.id", order.ID),
        attribute.Int("order.items", len(order.Items)),
        attribute.Float64("order.total", order.Total),
    )

    // Validate order (creates child span)
    if err := validateOrder(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "validation failed")
        return err
    }

    // Fulfill order
    if err := fulfillOrder(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "fulfillment failed")
        return err
    }

    span.SetStatus(codes.Ok, "order processed successfully")
    return nil
}

func validateOrder(ctx context.Context, order Order) error {
    _, span := otel.Tracer("order-service").Start(ctx, "validateOrder")
    defer span.End()

    // Validation logic...
    return nil
}

HTTP Middleware Instrumentation

import (
    "net/http"

    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

func main() {
    // Wrap handler with automatic tracing
    handler := http.HandlerFunc(orderHandler)
    wrappedHandler := otelhttp.NewHandler(handler, "order-handler")

    http.Handle("/orders", wrappedHandler)
    http.ListenAndServe(":8080", nil)
}

// Manual instrumentation for more control
func orderHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    tracer := otel.Tracer("order-service")

    ctx, span := tracer.Start(ctx, "orderHandler")
    defer span.End()

    // Extract order ID from request
    orderID := r.URL.Query().Get("id")
    span.SetAttributes(attribute.String("order.id", orderID))

    // Process order with propagated context
    order, err := fetchOrder(ctx, orderID)
    if err != nil {
        span.RecordError(err)
        http.Error(w, "Order not found", http.StatusNotFound)
        return
    }

    // ... handle response
}

Prometheus Metrics

Installation

go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promhttp

Metric Types and Patterns

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    // Counter: Monotonically increasing value
    httpRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    // Gauge: Value that can go up or down
    activeConnections = promauto.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )

    // Histogram: Observations bucketed by value
    httpRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: prometheus.DefBuckets, // [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
        },
        []string{"method", "path"},
    )

    // Summary: Similar to histogram but calculates quantiles
    dbQueryDuration = promauto.NewSummaryVec(
        prometheus.SummaryOpts{
            Name:       "db_query_duration_seconds",
            Help:       "Database query duration",
            Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
        },
        []string{"query_type"},
    )
)

Metrics Middleware

import (
    "net/http"
    "strconv"
    "time"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

// Metrics middleware that instruments all HTTP handlers
func MetricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()

        // Track active connections
        activeConnections.Inc()
        defer activeConnections.Dec()

        // Wrap response writer to capture status code
        rw := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}

        // Call next handler
        next.ServeHTTP(rw, r)

        // Record metrics
        duration := time.Since(start).Seconds()
        httpRequestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
        httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(rw.statusCode)).Inc()
    })
}

type responseWriter struct {
    http.ResponseWriter
    statusCode int
}

func (rw *responseWriter) WriteHeader(code int) {
    rw.statusCode = code
    rw.ResponseWriter.WriteHeader(code)
}

// Expose metrics endpoint
func main() {
    http.Handle("/metrics", promhttp.Handler())

    handler := MetricsMiddleware(http.HandlerFunc(orderHandler))
    http.Handle("/orders", handler)

    http.ListenAndServe(":8080", nil)
}

Custom Metrics Example

func ProcessPayment(ctx context.Context, payment Payment) error {
    timer := prometheus.NewTimer(dbQueryDuration.WithLabelValues("payment_insert"))
    defer timer.ObserveDuration()

    // Process payment
    if err := db.InsertPayment(payment); err != nil {
        httpRequestsTotal.WithLabelValues("POST", "/payments", "500").Inc()
        return err
    }

    httpRequestsTotal.WithLabelValues("POST", "/payments", "200").Inc()
    return nil
}

Structured Logging with slog

Basic Setup (Go 1.21+)

package main

import (
    "context"
    "log/slog"
    "os"
)

func initLogger() *slog.Logger {
    // JSON logger for production
    handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
        Level: slog.LevelInfo,
        AddSource: true, // Include file:line information
    })

    logger := slog.New(handler)
    slog.SetDefault(logger) // Set as default logger
    return logger
}

func main() {
    logger := initLogger()

    logger.Info("service starting",
        "service", "order-service",
        "version", "1.0.0",
        "port", 8080,
    )
}

Context-Aware Logging

import (
    "context"
    "log/slog"

    "go.opentelemetry.io/otel/trace"
)

// Add trace context to logger
func LoggerWithTrace(ctx context.Context) *slog.Logger {
    span := trace.SpanFromContext(ctx)
    spanCtx := span.SpanContext()

    return slog.With(
        "trace_id", spanCtx.TraceID().String(),
        "span_id", spanCtx.SpanID().String(),
    )
}

func HandleRequest(ctx context.Context, req Request) error {
    logger := LoggerWithTrace(ctx)

    logger.Info("processing request",
        "request_id", req.ID,
        "method", req.Method,
        "path", req.Path,
    )

    if err := processRequest(ctx, req); err != nil {
        logger.Error("request failed",
            "error", err,
            "duration_ms", time.Since(req.StartTime).Milliseconds(),
        )
        return err
    }

    logger.Info("request completed successfully",
        "duration_ms", time.Since(req.StartTime).Milliseconds(),
    )
    return nil
}

Log Levels and Structured Fields

func ProcessOrder(ctx context.Context, order Order) error {
    logger := LoggerWithTrace(ctx).With(
        "order_id", order.ID,
        "user_id", order.UserID,
    )

    logger.Debug("validating order", "items", len(order.Items))

    if len(order.Items) == 0 {
        logger.Warn("empty order received")
        return ErrEmptyOrder
    }

    logger.Info("order validation passed")

    if err := fulfillOrder(ctx, order); err != nil {
        logger.Error("fulfillment failed",
            "error", err,
            slog.Group("order_details",
                "total", order.Total,
                "items", len(order.Items),
            ),
        )
        return err
    }

    logger.Info("order processed successfully",
        "total", order.Total,
    )
    return nil
}

Health Checks and Graceful Shutdown

Health Check Endpoints

import (
    "context"
    "database/sql"
    "encoding/json"
    "net/http"
    "time"
)

type HealthChecker struct {
    db *sql.DB
    // Add other dependencies
}

type HealthStatus struct {
    Status      string            `json:"status"`
    Version     string            `json:"version"`
    Checks      map[string]string `json:"checks"`
    Timestamp   time.Time         `json:"timestamp"`
}

// Liveness probe - is the app running?
func (hc *HealthChecker) LivenessHandler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{
        "status": "alive",
    })
}

// Readiness probe - is the app ready to serve traffic?
func (hc *HealthChecker) ReadinessHandler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()

    status := HealthStatus{
        Status:    "ready",
        Version:   "1.0.0",
        Checks:    make(map[string]string),
        Timestamp: time.Now(),
    }

    // Check database
    if err := hc.db.PingContext(ctx); err != nil {
        status.Status = "not_ready"
        status.Checks["database"] = "unhealthy: " + err.Error()
        w.WriteHeader(http.StatusServiceUnavailable)
    } else {
        status.Checks["database"] = "healthy"
    }

    // Add more dependency checks (Redis, external APIs, etc.)

    w.Header().Set("Content-Type", "application/json")
    if status.Status == "ready" {
        w.WriteHeader(http.StatusOK)
    }
    json.NewEncoder(w).Encode(status)
}

Graceful Shutdown

import (
    "context"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    // Initialize tracer
    tp, err := initTracer("order-service")
    if err != nil {
        log.Fatal(err)
    }

    // Setup HTTP server
    server := &http.Server{
        Addr:    ":8080",
        Handler: setupRoutes(),
    }

    // Channel for shutdown signals
    shutdown := make(chan os.Signal, 1)
    signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)

    // Start server in goroutine
    go func() {
        slog.Info("server starting", "port", 8080)
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatal(err)
        }
    }()

    // Wait for shutdown signal
    <-shutdown
    slog.Info("shutdown signal received")

    // Create shutdown context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    // Shutdown HTTP server
    slog.Info("shutting down HTTP server")
    if err := server.Shutdown(ctx); err != nil {
        slog.Error("HTTP server shutdown error", "error", err)
    }

    // Shutdown tracer provider (flush spans)
    slog.Info("shutting down tracer")
    if err := tp.Shutdown(ctx); err != nil {
        slog.Error("tracer shutdown error", "error", err)
    }

    slog.Info("shutdown complete")
}

Complete Instrumentation Example

package main

import (
    "context"
    "database/sql"
    "log/slog"
    "net/http"
    "os"
    "time"

    "github.com/prometheus/client_golang/prometheus/promhttp"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
)

type Server struct {
    db     *sql.DB
    logger *slog.Logger
}

func (s *Server) orderHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()

    // Get tracer and create span
    tracer := otel.Tracer("order-service")
    ctx, span := tracer.Start(ctx, "orderHandler")
    defer span.End()

    // Create context-aware logger with trace ID
    logger := s.logger.With(
        "trace_id", span.SpanContext().TraceID().String(),
        "request_id", r.Header.Get("X-Request-ID"),
    )

    orderID := r.URL.Query().Get("id")
    span.SetAttributes(attribute.String("order.id", orderID))

    logger.Info("fetching order", "order_id", orderID)

    // Fetch order from database
    order, err := s.fetchOrder(ctx, orderID)
    if err != nil {
        span.RecordError(err)
        logger.Error("failed to fetch order", "error", err)
        http.Error(w, "Order not found", http.StatusNotFound)
        return
    }

    logger.Info("order fetched successfully",
        "order_id", orderID,
        "items", len(order.Items),
    )

    // Return order as JSON
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(order)
}

func (s *Server) fetchOrder(ctx context.Context, orderID string) (*Order, error) {
    _, span := otel.Tracer("order-service").Start(ctx, "fetchOrder")
    defer span.End()

    // Time database query
    start := time.Now()

    var order Order
    err := s.db.QueryRowContext(ctx, "SELECT * FROM orders WHERE id = ?", orderID).Scan(&order)

    duration := time.Since(start).Seconds()
    dbQueryDuration.WithLabelValues("select_order").Observe(duration)

    return &order, err
}

func setupRoutes(s *Server, hc *HealthChecker) http.Handler {
    mux := http.NewServeMux()

    // Health endpoints (no tracing needed)
    mux.HandleFunc("/health", hc.LivenessHandler)
    mux.HandleFunc("/ready", hc.ReadinessHandler)
    mux.Handle("/metrics", promhttp.Handler())

    // Business endpoints (with tracing)
    orderHandler := http.HandlerFunc(s.orderHandler)
    mux.Handle("/orders", otelhttp.NewHandler(orderHandler, "orders"))

    // Wrap everything with metrics middleware
    return MetricsMiddleware(mux)
}

Decision Trees

When to Use OpenTelemetry

Use OpenTelemetry When:

  • Building distributed systems with multiple services
  • Need to trace requests across service boundaries
  • Debugging performance issues in microservices
  • Want vendor-neutral observability (switch backends easily)
  • Require correlation between traces, metrics, and logs

Don't Use OpenTelemetry When:

  • Building simple monolithic applications
  • Performance overhead is critical (consider sampling)
  • Team lacks observability infrastructure (Jaeger, Zipkin)

When to Use Prometheus

Use Prometheus When:

  • Need time-series metrics for monitoring and alerting
  • Building operational dashboards (Grafana)
  • Measuring SLIs for SLO compliance
  • Tracking business metrics (requests/sec, conversion rates)
  • Kubernetes/containerized environments

Don't Use Prometheus When:

  • Need high-cardinality metrics (Prometheus has limits)
  • Require long-term metric storage (use Thanos/Cortex)
  • Need push-based metrics (Prometheus is pull-based)

When to Use slog

Use slog When:

  • Go 1.21+ projects (standard library, zero dependencies)
  • Need structured logging with JSON output
  • Want high-performance logging with minimal allocations
  • Integrating with log aggregation systems (Loki, ELK)

Don't Use slog When:

  • Go < 1.21 (use zap or zerolog instead)
  • Need complex log routing or filtering (use zap)
  • Require very specific features (audit trails, etc.)

Sampling Strategy Decision

Always Sample When:

  • Development/staging environments
  • Total traffic < 100 requests/sec
  • Debugging specific issues

Probabilistic Sampling When:

  • Production with moderate traffic (100-10K req/sec)
  • Sample rate: 1-10% typically

Tail-Based Sampling When:

  • High traffic production (>10K req/sec)
  • Only sample errors and slow requests
  • Requires tail-sampling processor (OpenTelemetry Collector)

Anti-Patterns to Avoid

❌ Not Propagating Context

WRONG: Breaking trace context

func processOrder(order Order) error {
    // Creates new context, loses trace!
    ctx := context.Background()
    return validateOrder(ctx, order)
}

CORRECT: Propagate context through call chain

func processOrder(ctx context.Context, order Order) error {
    // Propagates trace context
    return validateOrder(ctx, order)
}

❌ Cardinality Explosion

WRONG: Unbounded label values

// user_id can have millions of values!
httpRequests.WithLabelValues(r.Method, r.URL.Path, userID).Inc()

CORRECT: Use bounded labels

// Only method and path (bounded values)
httpRequests.WithLabelValues(r.Method, r.URL.Path).Inc()
// Track user-specific metrics separately if needed

❌ Logging Sensitive Data

WRONG: Exposing PII and secrets

logger.Info("user login",
    "email", user.Email,        // PII!
    "password", user.Password,  // CRITICAL!
    "token", authToken,         // SECRET!
)

CORRECT: Redact sensitive information

logger.Info("user login",
    "user_id", user.ID,  // Safe identifier
    "method", "password",
)

❌ Not Closing Spans

WRONG: Span leaks memory

func processOrder(ctx context.Context) error {
    ctx, span := tracer.Start(ctx, "processOrder")
    // Missing defer span.End()!

    if err := validate(); err != nil {
        return err  // Span never closed!
    }

    return nil
}

CORRECT: Always defer span.End()

func processOrder(ctx context.Context) error {
    ctx, span := tracer.Start(ctx, "processOrder")
    defer span.End()  // Always runs

    if err := validate(); err != nil {
        span.RecordError(err)
        return err
    }

    return nil
}

❌ Synchronous Metric Export

WRONG: Blocking requests with metric export

// Synchronous export blocks HTTP handler
exporter := jaeger.New(jaeger.WithCollectorEndpoint(...))
tp := sdktrace.NewTracerProvider(
    sdktrace.WithSyncer(exporter),  // BAD: Synchronous!
)

CORRECT: Use batching for async export

// Batching exports asynchronously
tp := sdktrace.NewTracerProvider(
    sdktrace.WithBatcher(exporter),  // GOOD: Async batching
)

❌ Missing Graceful Shutdown

WRONG: Losing traces on shutdown

func main() {
    tp, _ := initTracer("service")
    // Missing shutdown - spans lost!
    http.ListenAndServe(":8080", nil)
}

CORRECT: Shutdown exporters properly

func main() {
    tp, _ := initTracer("service")
    defer tp.Shutdown(context.Background())

    // Handle signals and graceful shutdown
    server.ListenAndServe()
}

Best Practices

  1. Context Propagation: Always pass context.Context through call chains
  2. Bounded Labels: Keep metric label cardinality under 1000 combinations
  3. Sampling: Use probabilistic sampling in high-traffic production
  4. Correlation IDs: Include trace_id in logs for correlation
  5. Health Checks: Implement both /health (liveness) and /ready (readiness)
  6. Graceful Shutdown: Flush traces and metrics before exit
  7. Error Recording: Use span.RecordError() for automatic error tracking
  8. Metric Naming: Follow Prometheus naming conventions (_total, _seconds)
  9. Log Levels: Use appropriate levels (Debug, Info, Warn, Error)
  10. Auto-Instrumentation: Use middleware for HTTP/gRPC when possible

Metric Naming Conventions

Follow Prometheus best practices:

Counter Metrics (always increasing):

  • http_requests_total (not http_requests)
  • payment_transactions_total
  • errors_total

Gauge Metrics (can go up or down):

  • active_connections
  • queue_size
  • memory_usage_bytes

Histogram/Summary Metrics (observations):

  • http_request_duration_seconds (not _milliseconds)
  • db_query_duration_seconds
  • response_size_bytes

Label Naming:

  • Use method, not http_method
  • Use status, not status_code or http_status
  • Use snake_case, not camelCase

Resources

Official Documentation:

Recent Guides (2025):

Related Skills:

  • golang-web-frameworks: HTTP server patterns and middleware
  • golang-testing-strategies: Testing instrumented code
  • verification-before-completion: Validating observability setup

Quick Reference

Initialize OpenTelemetry

tp, _ := initTracer("service-name")
defer tp.Shutdown(context.Background())

Create Spans

ctx, span := otel.Tracer("name").Start(ctx, "operation")
defer span.End()
span.SetAttributes(attribute.String("key", "value"))

Define Metrics

counter := promauto.NewCounterVec(opts, []string{"label"})
histogram := promauto.NewHistogramVec(opts, []string{"label"})

Structured Logging

logger := slog.With("trace_id", traceID)
logger.Info("message", "key", value)

Health Checks

http.HandleFunc("/health", livenessHandler)
http.HandleFunc("/ready", readinessHandler)

Token Estimate: ~5,000 tokens (entry point + full content) Version: 1.0.0 Last Updated: 2025-12-03