name	observability
description	Logging, metrics, and distributed tracing. OpenTelemetry, Prometheus, Grafana, and production debugging.
sasmp_version	2.0.0
bonded_agent	08-backend-observability
bond_type	PRIMARY_BOND
atomic_operations	LOGGING_SETUP, METRICS_COLLECTION, TRACING_IMPLEMENTATION, ALERTING_CONFIGURATION
parameter_validation	[object Object]
retry_logic	[object Object]
logging_hooks	[object Object]
exit_codes	[object Object]

Observability Skill

Name: observability
Author: pluginagentmarketplace

Bonded to: backend-observability-agent (Primary)

Quick Start

# Invoke observability skill
"Set up structured logging for my application"
"Configure Prometheus metrics"
"Implement distributed tracing with OpenTelemetry"

Three Pillars

Pillar	Purpose	Tools
Logs	What happened	ELK, Loki
Metrics	System state	Prometheus, Grafana
Traces	Request flow	Jaeger, OpenTelemetry

Examples

Structured Logging

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()

def process_request(request):
    log = logger.bind(
        correlation_id=request.headers.get("X-Correlation-ID"),
        user_id=request.user.id
    )
    log.info("request_started", method=request.method)

Prometheus Metrics

from prometheus_client import Counter, Histogram

REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')

@REQUEST_LATENCY.time()
async def handle_request(request):
    response = await process(request)
    REQUEST_COUNT.labels(method=request.method, status=response.status_code).inc()
    return response

OpenTelemetry Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@tracer.start_as_current_span("process_order")
def process_order(order_id: str):
    span = trace.get_current_span()
    span.set_attribute("order.id", order_id)

    with tracer.start_as_current_span("validate"):
        validate(order_id)

    with tracer.start_as_current_span("charge"):
        charge(order_id)

Key Metrics (RED Method)

Metric	Description	Alert Threshold
Rate	Requests/sec	Drop > 50%
Errors	Error rate %	> 1% for 5 min
Duration	P99 latency	> 500ms for 5 min

Troubleshooting

Issue	Cause	Solution
Missing logs	Log level too high	Adjust log level
High cardinality	Too many labels	Reduce label values
Broken traces	Context not propagated	Forward headers

observability

Install Skill

SKILL.md

Observability Skill

Quick Start

Three Pillars

Examples

Structured Logging

Prometheus Metrics

OpenTelemetry Tracing

Key Metrics (RED Method)

Troubleshooting

Resources