name	tokio-troubleshooting
description	Debugging and troubleshooting Tokio applications using tokio-console, detecting deadlocks, memory leaks, and performance issues. Use when diagnosing async runtime problems.

Tokio Troubleshooting

This skill provides techniques for debugging and troubleshooting async applications built with Tokio.

Using tokio-console for Runtime Inspection

Monitor async runtime in real-time:

// In Cargo.toml
[dependencies]
console-subscriber = "0.2"

// In main.rs
fn main() {
    console_subscriber::init();

    tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(async {
            run_application().await
        });
}

Run console in separate terminal:

tokio-console

Key metrics to monitor:

Task spawn rate and total tasks
Poll duration per task
Idle vs. busy time
Waker operations
Resource utilization

Identifying issues:

Long poll durations: CPU-intensive work in async context
Many wakers: Potential contention or inefficient polling
Growing task count: Task leak or unbounded spawning
High idle time: Not enough work or blocking operations

Debugging Deadlocks and Hangs

Detect and resolve deadlock situations:

Common Deadlock Pattern

// BAD: Potential deadlock
async fn deadlock_example() {
    let mutex1 = Arc::new(Mutex::new(()));
    let mutex2 = Arc::new(Mutex::new(()));

    let m1 = mutex1.clone();
    let m2 = mutex2.clone();
    tokio::spawn(async move {
        let _g1 = m1.lock().await;
        tokio::time::sleep(Duration::from_millis(10)).await;
        let _g2 = m2.lock().await; // May deadlock
    });

    let _g2 = mutex2.lock().await;
    tokio::time::sleep(Duration::from_millis(10)).await;
    let _g1 = mutex1.lock().await; // May deadlock
}

// GOOD: Consistent lock ordering
async fn no_deadlock_example() {
    let mutex1 = Arc::new(Mutex::new(()));
    let mutex2 = Arc::new(Mutex::new(()));

    // Always acquire locks in same order
    let _g1 = mutex1.lock().await;
    let _g2 = mutex2.lock().await;
}

// BETTER: Avoid nested locks
async fn best_example() {
    // Use message passing instead
    let (tx, mut rx) = mpsc::channel(10);

    tokio::spawn(async move {
        while let Some(msg) = rx.recv().await {
            process_message(msg).await;
        }
    });

    tx.send(message).await.unwrap();
}

Detecting Hangs with Timeouts

use tokio::time::{timeout, Duration};

async fn detect_hang() {
    match timeout(Duration::from_secs(5), potentially_hanging_operation()).await {
        Ok(result) => println!("Completed: {:?}", result),
        Err(_) => {
            eprintln!("Operation timed out - potential hang detected");
            // Log stack traces, metrics, etc.
        }
    }
}

Deadlock Detection with try_lock

use tokio::sync::Mutex;

async fn try_with_timeout(mutex: &Mutex<State>) -> Option<State> {
    for _ in 0..10 {
        if let Ok(guard) = mutex.try_lock() {
            return Some(guard.clone());
        }
        tokio::time::sleep(Duration::from_millis(10)).await;
    }
    eprintln!("Failed to acquire lock - possible deadlock");
    None
}

Memory Leak Detection

Identify and fix memory leaks:

Task Leaks

// BAD: Tasks never complete
async fn leaking_tasks() {
    loop {
        tokio::spawn(async {
            loop {
                // Never exits
                tokio::time::sleep(Duration::from_secs(1)).await;
            }
        });
    }
}

// GOOD: Tasks have exit condition
async fn proper_tasks(shutdown: broadcast::Receiver<()>) {
    loop {
        let mut shutdown_rx = shutdown.resubscribe();
        tokio::spawn(async move {
            loop {
                tokio::select! {
                    _ = shutdown_rx.recv() => break,
                    _ = tokio::time::sleep(Duration::from_secs(1)) => {
                        // Work
                    }
                }
            }
        });
    }
}

Arc Cycles

// BAD: Reference cycle
struct Node {
    next: Option<Arc<Mutex<Node>>>,
    prev: Option<Arc<Mutex<Node>>>, // Creates cycle!
}

// GOOD: Use weak references
use std::sync::Weak;

struct Node {
    next: Option<Arc<Mutex<Node>>>,
    prev: Option<Weak<Mutex<Node>>>, // Weak reference breaks cycle
}

Monitoring Memory Usage

use sysinfo::{System, SystemExt};

pub async fn memory_monitor() {
    let mut system = System::new_all();
    let mut interval = tokio::time::interval(Duration::from_secs(60));

    loop {
        interval.tick().await;
        system.refresh_memory();

        let used = system.used_memory();
        let total = system.total_memory();
        let percent = (used as f64 / total as f64) * 100.0;

        tracing::info!(
            used_mb = used / 1024 / 1024,
            total_mb = total / 1024 / 1024,
            percent = %.2 percent,
            "Memory usage"
        );

        if percent > 80.0 {
            tracing::warn!("High memory usage detected");
        }
    }
}

Performance Profiling with Tracing

Instrument code for performance analysis:

use tracing::{info, instrument, span, Level};

#[instrument]
async fn process_request(id: u64) -> Result<Response, Error> {
    let span = span!(Level::INFO, "database_query");
    let _enter = span.enter();

    let data = fetch_from_database(id).await?;

    drop(_enter);

    let span = span!(Level::INFO, "transformation");
    let _enter = span.enter();

    let result = transform_data(data).await?;

    Ok(Response { result })
}

// Configure subscriber for flame graphs
use tracing_subscriber::layer::SubscriberExt;

fn init_tracing() {
    let fmt_layer = tracing_subscriber::fmt::layer();
    let filter_layer = tracing_subscriber::EnvFilter::from_default_env();

    tracing_subscriber::registry()
        .with(filter_layer)
        .with(fmt_layer)
        .init();
}

Understanding Panic Messages

Common async panic patterns:

Panics in Spawned Tasks

// Panic is isolated to the task
tokio::spawn(async {
    panic!("This won't crash the program");
});

// To catch panics
let handle = tokio::spawn(async {
    // Work that might panic
});

match handle.await {
    Ok(result) => println!("Success: {:?}", result),
    Err(e) if e.is_panic() => {
        eprintln!("Task panicked: {:?}", e);
        // Handle panic
    }
    Err(e) => eprintln!("Task cancelled: {:?}", e),
}

Send + 'static Errors

// ERROR: future cannot be sent between threads
async fn bad_example() {
    let rc = Rc::new(5); // Rc is !Send
    tokio::spawn(async move {
        println!("{}", rc); // Error!
    });
}

// FIX: Use Arc instead
async fn good_example() {
    let rc = Arc::new(5); // Arc is Send
    tokio::spawn(async move {
        println!("{}", rc); // OK
    });
}

// ERROR: borrowed value does not live long enough
async fn lifetime_error() {
    let data = String::from("hello");
    tokio::spawn(async {
        println!("{}", data); // Error: data might not live long enough
    });
}

// FIX: Move ownership
async fn lifetime_fixed() {
    let data = String::from("hello");
    tokio::spawn(async move {
        println!("{}", data); // OK: data is moved
    });
}

Common Error Patterns and Solutions

Blocking in Async Context

// PROBLEM: Detected with tokio-console (long poll time)
async fn blocking_example() {
    std::thread::sleep(Duration::from_secs(1)); // Blocks thread!
}

// SOLUTION
async fn non_blocking_example() {
    tokio::time::sleep(Duration::from_secs(1)).await; // Yields control
}

// For unavoidable blocking
async fn necessary_blocking() {
    tokio::task::spawn_blocking(|| {
        expensive_cpu_work()
    }).await.unwrap();
}

Channel Closed Errors

// PROBLEM: SendError because receiver dropped
async fn send_error_example() {
    let (tx, rx) = mpsc::channel(10);
    drop(rx); // Receiver dropped

    match tx.send(42).await {
        Ok(_) => println!("Sent"),
        Err(e) => eprintln!("Send failed: {}", e), // Channel closed
    }
}

// SOLUTION: Check if receiver exists
async fn handle_closed_channel() {
    let (tx, rx) = mpsc::channel(10);

    tokio::spawn(async move {
        // Receiver keeps channel open
        while let Some(msg) = rx.recv().await {
            process(msg).await;
        }
    });

    // Or handle the error
    if let Err(e) = tx.send(42).await {
        tracing::warn!("Channel closed: {}", e);
        // Cleanup or alternative action
    }
}

Task Cancellation

// PROBLEM: Task cancelled unexpectedly
let handle = tokio::spawn(async {
    // Long-running work
});

handle.abort(); // Cancels task

// SOLUTION: Handle cancellation gracefully
let handle = tokio::spawn(async {
    let result = tokio::select! {
        result = do_work() => result,
        _ = tokio::signal::ctrl_c() => {
            cleanup().await;
            return Err(Error::Cancelled);
        }
    };
    result
});

Testing Async Code Effectively

Write reliable async tests:

#[tokio::test]
async fn test_with_timeout() {
    tokio::time::timeout(
        Duration::from_secs(5),
        async {
            let result = my_async_function().await;
            assert!(result.is_ok());
        }
    )
    .await
    .expect("Test timed out");
}

#[tokio::test]
async fn test_concurrent_access() {
    let shared = Arc::new(Mutex::new(0));

    let handles: Vec<_> = (0..10)
        .map(|_| {
            let shared = shared.clone();
            tokio::spawn(async move {
                let mut lock = shared.lock().await;
                *lock += 1;
            })
        })
        .collect();

    for handle in handles {
        handle.await.unwrap();
    }

    assert_eq!(*shared.lock().await, 10);
}

// Test with mocked time
#[tokio::test(start_paused = true)]
async fn test_with_time_control() {
    let start = tokio::time::Instant::now();

    tokio::time::sleep(Duration::from_secs(100)).await;

    // Time is mocked, so this completes instantly
    assert!(start.elapsed() < Duration::from_secs(1));
}

Debugging Checklist

When troubleshooting async issues:

Use tokio-console to monitor runtime behavior
Check for blocking operations with tracing
Verify all locks are released properly
Look for task leaks (growing task count)
Monitor memory usage over time
Add timeouts to detect hangs
Check for channel closure errors
Verify Send + 'static bounds are satisfied
Use try_lock to detect potential deadlocks
Profile with tracing for performance bottlenecks
Test with tokio-test for time-based code
Check for Arc cycles with weak references

Helpful Tools

tokio-console: Real-time async runtime monitoring
tracing: Structured logging and profiling
cargo-flamegraph: Generate flame graphs
valgrind/heaptrack: Memory profiling
perf: CPU profiling on Linux
Instruments: Profiling on macOS

Best Practices

Always use tokio-console in development
Add tracing spans to critical code paths
Use timeouts liberally to detect hangs
Monitor task count for leaks
Profile before optimizing - measure first
Test with real concurrency - don't just test happy paths
Handle cancellation gracefully in all tasks
Use structured logging for debugging
Avoid nested locks - prefer message passing
Document lock ordering when necessary

tokio-troubleshooting

Install Skill

SKILL.md