name	observability
type	knowledge
description	Logging, debugging, profiling, and performance monitoring for development. Use when adding logging, debugging issues, profiling performance, or instrumenting code for observability.
keywords	logging, debug, debugger, pdb, profiling, performance, trace, stack trace, logger, log level, metrics, monitoring
auto_activate	true

Observability Skill

Developer-focused observability: logging, debugging, profiling, and performance monitoring.

When This Skill Activates

Adding logging to code
Debugging errors or issues
Profiling performance bottlenecks
Analyzing memory usage
Adding instrumentation
Setting up structured logging
Keywords: "logging", "debug", "profiling", "performance", "trace"

Structured Logging

Why Structured Logging?

Traditional (unstructured):

print(f"User {user_id} logged in from {ip_address}")
# Hard to parse, search, filter

Structured (JSON):

logger.info("user_login", extra={
    "user_id": user_id,
    "ip_address": ip_address,
    "timestamp": datetime.utcnow().isoformat()
})
# Easy to parse, search, filter, aggregate

Benefits:

✅ Easy to search/filter
✅ Machine-readable
✅ Aggregate metrics
✅ Correlate events
✅ Feed to log aggregators (ELK, Splunk, Datadog)

Python Logging Setup

Basic configuration:

import logging
import sys

# Configure root logger
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout),  # Console
        logging.FileHandler('app.log')      # File
    ]
)

logger = logging.getLogger(__name__)

# Use it
logger.info("Application started")
logger.warning("Database connection slow")
logger.error("Failed to process request", exc_info=True)

Log Levels

Level	Value	Use When	Example
DEBUG	10	Development, detailed tracing	Variable values, function calls
INFO	20	Normal operations	User logged in, request processed
WARNING	30	Unexpected but handled	Deprecated API used, slow query
ERROR	40	Error occurred	Failed to save file, API error
CRITICAL	50	System failure	Database down, disk full

Usage:

logger.debug(f"Processing item: {item}")           # Development only
logger.info(f"User {user_id} logged in")           # Normal event
logger.warning(f"Query took {duration}s (slow!)")  # Unexpected
logger.error(f"Failed to save: {error}")           # Error
logger.critical(f"Database unreachable!")          # System failure

Structured JSON Logging

Using python-json-logger:

from pythonjsonlogger import jsonlogger
import logging

# Setup JSON formatter
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
    '%(asctime)s %(name)s %(levelname)s %(message)s'
)
logHandler.setFormatter(formatter)

logger = logging.getLogger()
logger.addHandler(logHandler)
logger.setLevel(logging.INFO)

# Log with extra fields
logger.info("user_login", extra={
    "user_id": 123,
    "ip_address": "192.168.1.1",
    "user_agent": "Mozilla/5.0..."
})

Output:

{
  "asctime": "2025-10-24 12:00:00,000",
  "name": "root",
  "levelname": "INFO",
  "message": "user_login",
  "user_id": 123,
  "ip_address": "192.168.1.1",
  "user_agent": "Mozilla/5.0..."
}

Context Logging

Add context to all logs in a function:

import logging
from contextvars import ContextVar

# Context variable
request_id_var: ContextVar[str] = ContextVar('request_id', default=None)

class ContextFilter(logging.Filter):
    def filter(self, record):
        record.request_id = request_id_var.get()
        return True

# Add filter
logger.addFilter(ContextFilter())

# Use in request handler
def handle_request(request):
    request_id = generate_request_id()
    request_id_var.set(request_id)

    logger.info("Processing request")  # Includes request_id automatically
    process_data()
    logger.info("Request complete")    # Includes request_id automatically

Best Practices for Logging

✅ DO:

# 1. Use appropriate log levels
logger.debug("Variable value: {value}")      # Debug details
logger.info("User action completed")         # Normal operations
logger.error("Failed to process", exc_info=True)  # Errors

# 2. Include context
logger.info("Processing payment", extra={
    "user_id": user_id,
    "amount": amount,
    "currency": currency
})

# 3. Log exceptions with stack traces
try:
    process_data()
except Exception as e:
    logger.error("Processing failed", exc_info=True)  # Includes stack trace

# 4. Use lazy formatting
logger.debug("Processing %s items", len(items))  # Only formats if DEBUG enabled

❌ DON'T:

# 1. Don't log sensitive data
logger.info(f"User password: {password}")  # NEVER!
logger.info(f"Credit card: {card_number}")  # NEVER!

# 2. Don't log in loops (too verbose)
for item in items:
    logger.info(f"Processing {item}")  # 1M items = 1M log lines!

# 3. Don't use print() in production
print("Debug info")  # Use logger.debug() instead

# 4. Don't catch and ignore
try:
    process()
except:
    pass  # Silent failure! At least log it!

Debugging

Print Debugging

Strategic print statements:

def process_data(data):
    print(f"[DEBUG] Input: {data}")  # See input

    result = transform(data)
    print(f"[DEBUG] After transform: {result}")  # See intermediate

    validated = validate(result)
    print(f"[DEBUG] After validate: {validated}")  # See output

    return validated

Use temporary, remove before commit:

# ✅ GOOD: Temporary debug
print(f"[DEBUG] value={value}")  # Remove before commit

# ❌ BAD: Permanent debug prints
print("Processing...")  # Clutters output forever

Interactive Debugger (pdb)

Built-in Python debugger:

import pdb

def buggy_function(x, y):
    result = x + y
    pdb.set_trace()  # Debugger starts here
    return result * 2

# When debugger starts:
# (Pdb) p x          # Print variable x
# (Pdb) p y          # Print variable y
# (Pdb) p result     # Print result
# (Pdb) n            # Next line
# (Pdb) s            # Step into function
# (Pdb) c            # Continue execution
# (Pdb) q            # Quit debugger

Breakpoint (Python 3.7+):

def buggy_function(x, y):
    result = x + y
    breakpoint()  # Better than pdb.set_trace()
    return result * 2

IPython Debugger (ipdb)

Enhanced debugger with syntax highlighting:

pip install ipdb

import ipdb

def buggy_function(x, y):
    result = x + y
    ipdb.set_trace()  # Colorized, tab completion
    return result * 2

Commands:

(ipdb) p x              # Print x
(ipdb) pp x             # Pretty-print x
(ipdb) l                # List code around current line
(ipdb) w                # Where (stack trace)
(ipdb) u                # Up stack frame
(ipdb) d                # Down stack frame
(ipdb) h                # Help

Post-Mortem Debugging

Debug after exception:

import pdb
import sys

def buggy_function():
    x = 1
    y = 0
    return x / y  # ZeroDivisionError

try:
    buggy_function()
except:
    pdb.post_mortem(sys.exc_info()[2])
    # Debugger starts at exception point

Automatic on uncaught exception:

import sys
import pdb

def exception_handler(exc_type, exc_value, exc_traceback):
    if exc_type is KeyboardInterrupt:
        sys.__excepthook__(exc_type, exc_value, exc_traceback)
        return

    print("Uncaught exception, starting debugger:")
    pdb.post_mortem(exc_traceback)

sys.excepthook = exception_handler

# Now any uncaught exception starts debugger

Profiling

CPU Profiling (cProfile)

Find slow functions:

import cProfile
import pstats

def slow_function():
    # Your code here
    pass

# Profile
profiler = cProfile.Profile()
profiler.enable()
slow_function()
profiler.disable()

# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)  # Top 10 slowest functions

Output:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.500    0.050    0.800    0.080 mymodule.py:10(slow_function)
      100    0.300    0.003    0.300    0.003 mymodule.py:20(helper)

Sort options:

cumulative - Total time (includes subfunctions)
time - Internal time (excludes subfunctions)
calls - Number of calls

Profile Decorator

Profile specific functions:

import cProfile
import pstats
from functools import wraps

def profile(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)

        return result
    return wrapper

@profile
def slow_function():
    # Your code
    pass

Line Profiler (line_profiler)

Profile line-by-line:

pip install line_profiler

from line_profiler import LineProfiler

def slow_function():
    total = 0
    for i in range(1000000):
        total += i
    return total

profiler = LineProfiler()
profiler.add_function(slow_function)
profiler.enable()
slow_function()
profiler.disable()
profiler.print_stats()

Output:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     2                                           def slow_function():
     3         1          2.0      2.0      0.0      total = 0
     4   1000001     150000.0      0.1     30.0      for i in range(1000000):
     5   1000000     350000.0      0.4     70.0          total += i
     6         1          1.0      1.0      0.0      return total

Memory Profiling (memory_profiler)

Find memory leaks:

pip install memory_profiler

from memory_profiler import profile

@profile
def memory_hog():
    big_list = [0] * (10 ** 7)  # 10M items
    return sum(big_list)

memory_hog()

Output:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     2     40.0 MiB     40.0 MiB           1   @profile
     3                                         def memory_hog():
     4    116.4 MiB     76.4 MiB           1       big_list = [0] * (10 ** 7)
     5    116.4 MiB      0.0 MiB           1       return sum(big_list)

Sampling Profiler (py-spy)

Profile running processes (no code changes needed):

pip install py-spy

# Profile running Python process
py-spy top --pid 12345

# Generate flamegraph
py-spy record -o profile.svg --pid 12345

Advantages:

✅ No code modification
✅ Profile production code
✅ Low overhead
✅ Visualizations (flamegraphs)

Stack Traces

Print Stack Trace

import traceback

try:
    risky_operation()
except Exception as e:
    traceback.print_exc()  # Print stack trace to stderr

Capture to string:

import traceback

try:
    risky_operation()
except Exception as e:
    trace = traceback.format_exc()
    logger.error(f"Error occurred: {trace}")

Rich Tracebacks (rich)

Beautiful, informative tracebacks:

pip install rich

from rich.traceback import install
install(show_locals=True)

def buggy_function():
    x = 42
    y = None
    return x + y  # TypeError with beautiful traceback

buggy_function()

Shows:

✅ Syntax highlighted code
✅ Local variables at each frame
✅ Clear error message

Performance Monitoring

Timing Decorator

Measure function execution time:

import time
from functools import wraps

def timer(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        end = time.perf_counter()

        duration = end - start
        print(f"{func.__name__} took {duration:.4f}s")

        return result
    return wrapper

@timer
def slow_function():
    time.sleep(1)
    return "done"

slow_function()
# Output: slow_function took 1.0001s

Context Manager Timer

import time
from contextlib import contextmanager

@contextmanager
def timer(name):
    start = time.perf_counter()
    yield
    end = time.perf_counter()
    print(f"{name} took {end - start:.4f}s")

# Usage
with timer("Database query"):
    result = db.query(...)

with timer("File processing"):
    process_file(path)

Performance Assertions

Fail if too slow:

import time

def performance_test():
    start = time.perf_counter()

    # Your code
    process_data()

    duration = time.perf_counter() - start
    assert duration < 0.1, f"Too slow: {duration:.4f}s"

Simple Metrics

Counter

class Metrics:
    def __init__(self):
        self.counters = {}

    def increment(self, name, value=1):
        self.counters[name] = self.counters.get(name, 0) + value

    def get(self, name):
        return self.counters.get(name, 0)

    def reset(self):
        self.counters = {}

# Usage
metrics = Metrics()

def process_item(item):
    metrics.increment("items_processed")
    if item.is_valid():
        metrics.increment("items_valid")
    else:
        metrics.increment("items_invalid")

# Later
print(f"Processed: {metrics.get('items_processed')}")
print(f"Valid: {metrics.get('items_valid')}")
print(f"Invalid: {metrics.get('items_invalid')}")

Histogram

Track value distributions:

from collections import defaultdict

class Histogram:
    def __init__(self):
        self.buckets = defaultdict(int)

    def record(self, value):
        # Bucket by value range
        if value < 1:
            bucket = "< 1s"
        elif value < 5:
            bucket = "1-5s"
        elif value < 10:
            bucket = "5-10s"
        else:
            bucket = "> 10s"

        self.buckets[bucket] += 1

    def report(self):
        for bucket, count in sorted(self.buckets.items()):
            print(f"{bucket}: {count}")

# Usage
histogram = Histogram()

for duration in query_durations:
    histogram.record(duration)

histogram.report()
# Output:
# < 1s: 850
# 1-5s: 120
# 5-10s: 25
# > 10s: 5

Debugging Best Practices

1. Binary Search Debugging

Narrow down the problem:

# Problem: Something breaks between start and end

def debug_binary_search():
    # Step 1: Check middle
    middle_result = process_half()
    print(f"Middle result: {middle_result}")

    # Step 2: Narrow to first or second half
    if middle_result == expected:
        # Problem in second half
        debug_second_half()
    else:
        # Problem in first half
        debug_first_half()

2. Rubber Duck Debugging

Explain code out loud (or to a rubber duck):

Describe what code should do
Explain what code actually does
Often you'll spot the bug while explaining!

3. Add Assertions

Catch bugs early:

def divide(a, b):
    assert b != 0, "Divisor cannot be zero"
    assert isinstance(a, (int, float)), "a must be numeric"
    assert isinstance(b, (int, float)), "b must be numeric"

    result = a / b

    assert isinstance(result, (int, float)), "Result should be numeric"
    return result

4. Simplify and Isolate

Reduce problem to minimal case:

# ❌ BAD: Complex function with bug
def complex_function(data, config, options, flags):
    # 100 lines of code
    pass

# ✅ GOOD: Extract minimal failing case
def minimal_failing_case():
    data = [1, 2, 3]
    result = transform(data)  # Fails here
    return result

Logging Anti-Patterns

❌ DON'T:

# 1. Log sensitive data
logger.info(f"Password: {password}")

# 2. Log in tight loops
for item in million_items:
    logger.info(f"Processing {item}")

# 3. Catch and ignore silently
try:
    critical_operation()
except:
    pass  # Silent failure!

# 4. Over-log (log everything)
logger.debug("Starting function")
logger.debug("Variable x = 1")
logger.debug("Variable y = 2")
logger.debug("Calling helper")
logger.debug("Helper returned")

# 5. Use wrong log level
logger.error("User logged in")  # Should be INFO
logger.info("Database crashed")  # Should be CRITICAL

Quick Reference

Logging

logger.debug("Detailed trace")
logger.info("Normal event")
logger.warning("Unexpected")
logger.error("Error occurred", exc_info=True)
logger.critical("System failure")

Debugging

breakpoint()              # Start debugger
pdb.set_trace()          # Start debugger (old way)
traceback.print_exc()    # Print stack trace

Profiling

# CPU
python -m cProfile -s cumulative script.py

# Memory
@profile  # from memory_profiler
def func(): pass

# Live process
py-spy top --pid 12345

Timing

@timer
def func(): pass

with timer("operation"):
    do_work()

Key Takeaways

Structured logging - Use JSON for machine-readable logs
Appropriate log levels - DEBUG/INFO/WARNING/ERROR/CRITICAL
Don't log secrets - Never log passwords, API keys, etc.
Use debugger - breakpoint() or ipdb for interactive debugging
Profile first - Don't optimize without profiling
cProfile for CPU - Find slow functions
memory_profiler for memory - Find memory leaks
Stack traces on errors - Always log exc_info=True
Time critical paths - Use @timer decorator
Simple metrics - Track counters and histograms

Version: 1.0.0 Type: Knowledge skill (no scripts) See Also: python-standards (best practices), testing-guide (test debugging), security-patterns (safe logging)

Install Skill

SKILL.md

Observability Skill

When This Skill Activates

Structured Logging

Why Structured Logging?

Python Logging Setup

Log Levels

Structured JSON Logging

Context Logging

Best Practices for Logging

Debugging

Print Debugging

Interactive Debugger (pdb)

IPython Debugger (ipdb)

Post-Mortem Debugging

Profiling

CPU Profiling (cProfile)

Profile Decorator

Line Profiler (line_profiler)

Memory Profiling (memory_profiler)

Sampling Profiler (py-spy)

Stack Traces

Print Stack Trace

Rich Tracebacks (rich)

Performance Monitoring

Timing Decorator

Context Manager Timer

Performance Assertions

Simple Metrics

Counter

Histogram

Debugging Best Practices

1. Binary Search Debugging

2. Rubber Duck Debugging

3. Add Assertions

4. Simplify and Isolate

Logging Anti-Patterns

Quick Reference

Logging

Debugging

Profiling

Timing

Key Takeaways