name	monitoring-debugging
description	Monitor Bob The Skull operation and debug issues. Use when checking system health, diagnosing problems, analyzing logs, monitoring state transitions, or troubleshooting component failures. Covers web monitor, logs, MQTT, state machine, and performance metrics.
allowed-tools	Read, Bash, Grep

Monitoring & Debugging Skill

Comprehensive guide for monitoring Bob's operation and diagnosing issues quickly.

When to Use This Skill

"Bob isn't responding" - Diagnose unresponsive system
"Wake word not detecting" - Debug audio input issues
"Check system health" - Monitor operation
"Why is Bob stuck in [state]?" - State machine debugging
"Analyze logs" - Log analysis and interpretation
"Monitor performance" - Check latency and metrics

Quick Reference

Monitoring Tools

Tool	Purpose	Access
Web Monitor	Real-time dashboard	http://localhost:5001 (PC) http://192.168.1.44:5001 (Pi)
MQTT Monitor	Event bus viewer	`python mqtt_monitor.py`
Log Files	Detailed history	`logs/bob.log`
Config Dashboard	Settings interface	http://localhost:5001/config

Quick Diagnostics

# Is Bob running?
ps aux | grep -i bob

# Current state
curl -s http://localhost:5001/api/state | jq

# Recent errors
tail -50 logs/bob.log | grep -i error

# Recent events
curl -s http://localhost:5001/api/events | jq '.[-10:]'

# Component health
curl -s http://localhost:5001/api/health | jq

Web Monitor Dashboard

Accessing the Dashboard

Local (Development PC):

http://localhost:5001

Remote (Raspberry Pi):

http://192.168.1.44:5001

Dashboard Components

1. Current State Display

Shows current state machine state
Color-coded: Green (active), Yellow (transitioning), Red (error)
Timestamp of last state change

2. Recent Events Feed

Last 50 events in reverse chronological order
Event type, timestamp, details
Filterable by event type

3. Component Status

Wake Word: Active/Inactive
STT: Ready/Processing
LLM: Ready/Processing
TTS: Ready/Speaking
Vision: FPS, detections
Eyes: Connected/Disconnected

4. Performance Metrics

Wake word latency
STT processing time
LLM response time
TTS synthesis time
Vision frame rate
Total conversation latency

5. Configuration Links

Direct links to all config pages
Quick access to settings

Common Dashboard Patterns

Normal Operation:

State cycles: IDLE → WAKE_LISTENING → GREETING → LISTENING → PROCESSING → SPEAKING → WAKE_LISTENING
Regular WakeWordDetectedEvent
Consistent frame rate (5-10 FPS)
Low error count

Problem Indicators:

State stuck for >30 seconds
Repeated timeout events
Error events appearing
Component status showing "Disconnected"
Missing expected events

Log Analysis

Log File Locations

# Main application log
logs/bob.log

# Component-specific logs (if configured)
logs/wake_word.log
logs/stt.log
logs/llm.log
logs/tts.log
logs/vision.log

Essential Log Patterns

Show all errors:

grep -i error logs/bob.log
# or
tail -f logs/bob.log | grep --color=always -i error

Show state transitions:

grep "State transition" logs/bob.log
# Expected: IDLE -> WAKE_LISTENING -> GREETING -> ...

Show event publications:

grep "Publishing event" logs/bob.log

Show component initialization:

grep "Initializing" logs/bob.log
# Verify all components loaded

Filter by time range:

# Last 100 lines
tail -100 logs/bob.log

# Last hour
grep "$(date '+%Y-%m-%d %H')" logs/bob.log

# Specific timestamp
grep "2024-12-09 15:30" logs/bob.log

Event frequency analysis:

# Count events by type
grep "Publishing event" logs/bob.log | awk '{print $5}' | sort | uniq -c | sort -rn

# Output:
#   145 WakeWordDetectedEvent
#    98 SpeechRecognizedEvent
#    87 LLMResponseEvent
#    ...

Log Level Interpretation

DEBUG: Detailed diagnostic information (verbose)
INFO: General operational messages (normal)
WARNING: Potentially problematic situations (investigate)
ERROR: Error events (requires attention)
CRITICAL: Severe errors (immediate action)

Common Debugging Scenarios

Scenario 1: Wake Word Not Detecting

Symptoms:

No response to "Hey Bob" or "Wake up Bob"
No WakeWordDetectedEvent in logs/web monitor

Diagnosis:

# 1. Check if wake word component is running
grep "wake word" logs/bob.log | tail -5

# 2. Check audio input device
python list_audio_devices.py
# Compare to AUDIO_INPUT_DEVICE_INDEX in .env

# 3. Check microphone is receiving input
# On Pi:
arecord -d 3 test.wav && aplay test.wav
# Should hear your recording

# 4. Check sensitivity setting
grep WAKE_WORD_SENSITIVITY .env
# Default: 0.5 (lower = more sensitive, 0.0-1.0)

# 5. Test with audio injection (if configured)
python test_wake_word_inject.py play --file audio/static/testing/wake_up_bob.mp3

Common Causes:

❌ Wrong audio device index
❌ Microphone muted or disconnected
❌ Sensitivity too high (0.9-1.0)
❌ Wake word model not loaded
❌ Picovoice API key invalid

Solutions:

# Fix audio device
# 1. List devices: python list_audio_devices.py
# 2. Update .env: BOBTHESKULL_AUDIO_INPUT_DEVICE_INDEX=X
# 3. Restart Bob

# Lower sensitivity
# Edit .env: BOBTHESKULL_WAKE_WORD_SENSITIVITY=0.3
# Restart Bob

# Verify API key
grep PICOVOICE_ACCESS_KEY .env
# Check at console.picovoice.ai

Scenario 2: State Machine Stuck

Symptoms:

Bob unresponsive
State hasn't changed in minutes
Timeouts in logs

Diagnosis:

# 1. Check current state
curl -s http://localhost:5001/api/state | jq
# Look at 'current_state' and 'time_in_state'

# 2. Check recent transitions
grep "State transition" logs/bob.log | tail -10

# 3. Check for timeout events
grep -i timeout logs/bob.log | tail -5

# 4. Check what triggered stuck state
grep "Entering state" logs/bob.log | tail -3

Common Stuck States:

PROCESSING (stuck):

LLM not responding
Network timeout
API rate limit

SPEAKING (stuck):

TTS failed to complete
Audio output issue
MPV process hung

LISTENING (stuck):

STT waiting for input that never came
Microphone stopped working
Timeout not configured

Solutions:

# Graceful recovery (restart Bob)
# Ctrl+C in terminal or:
pkill -f BobTheSkull.py
python BobTheSkull.py

# Check timeouts are configured
grep TIMEOUT .env
# Ensure STATE_MACHINE_*_TIMEOUT values are set

# For specific stuck states:
# - PROCESSING: Check LLM logs, API key, network
# - SPEAKING: Check audio output device
# - LISTENING: Check STT configuration

Scenario 3: LLM Not Responding

Symptoms:

Bob hears speech but doesn't respond
Stuck in PROCESSING state
Timeout after 30+ seconds

Diagnosis:

# 1. Check LLM events in logs
grep -E "LLMRequest|LLMResponse|LLMError" logs/bob.log | tail -10

# 2. Verify API key
grep OPENAI_API_KEY .env
# Should start with sk-

# 3. Test API connectivity
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $(grep OPENAI_API_KEY .env | cut -d= -f2)"
# Should return list of models

# 4. Check for rate limit errors
grep "429" logs/bob.log
# 429 = rate limit exceeded

# 5. Check model configuration
grep LLM_MODEL .env
# Default: gpt-4-turbo

Common Causes:

❌ Invalid or expired API key
❌ Rate limit exceeded
❌ Network connectivity issues
❌ Model not available
❌ Request timeout

Solutions:

# Test different model
# Edit .env: BOBTHESKULL_LLM_MODEL=gpt-3.5-turbo
# (Faster, cheaper, might work if rate limited)

# Check API usage
# Visit platform.openai.com/usage

# Verify network
ping api.openai.com

# Check firewall
# Ensure port 443 (HTTPS) is open

Scenario 4: Vision Not Working

Symptoms:

No face detection events
Vision FPS = 0
Camera errors in logs

Diagnosis:

# 1. Check if vision is enabled
grep VISION_CAN_SEE .env
# Should be: BOBTHESKULL_VISION_CAN_SEE=true

# 2. Check camera is accessible
ls /dev/video*
# Should see: /dev/video0 (or similar)

# 3. Test camera directly
python test_vision_live.py
# Should open window with camera feed

# 4. Check vision logs
grep -i vision logs/bob.log | tail -20

# 5. Check GPU if using acceleration
grep VISION_ENABLE_GPU .env
python test_gpu_status.py

Common Causes:

❌ Camera not connected
❌ Camera in use by another process
❌ Vision disabled in config
❌ GPU issues (if using acceleration)
❌ Missing vision dependencies

Solutions:

# Test camera availability
# Kill other processes using camera:
sudo lsof /dev/video0
# Kill PID if found

# Disable GPU acceleration
# Edit .env: BOBTHESKULL_VISION_ENABLE_GPU=false
# Restart Bob

# Check dependencies
pip list | grep -E "opencv|onnx|dlib"
# Should show installed versions

Scenario 5: Audio Output Not Working

Symptoms:

Bob processes but no speech heard
TTS completes but silent
Audio file generates but doesn't play

Diagnosis:

# 1. Check audio output device
python list_audio_devices.py
# Verify OUTPUT device index

# 2. Test audio output directly
python test_audio_output.py
# Should hear test tones

# 3. Check MPV is installed
which mpv  # Linux/Mac
where mpv  # Windows
# Should show path to mpv binary

# 4. Check TTS logs
grep -E "TTS|Speaking" logs/bob.log | tail -10

# 5. Test TTS directly
python test_tts_live.py

Common Causes:

❌ Wrong output device index
❌ Speaker muted or disconnected
❌ MPV not installed or not in PATH
❌ Audio file playback failed
❌ Volume set to 0

Solutions:

# Fix output device
# 1. List devices: python list_audio_devices.py
# 2. Update .env: BOBTHESKULL_AUDIO_OUTPUT_DEVICE_INDEX=X
# 3. Restart Bob

# Install MPV
# Linux: sudo apt install mpv
# Mac: brew install mpv
# Windows: Download from mpv.io

# Check volume
# Ensure system volume > 0
# Check Bob's volume config

Scenario 6: High Latency / Slow Response

Symptoms:

Delay between speech and response
Vision FPS very low
State transitions taking >10 seconds

Diagnosis:

# 1. Check performance metrics in web monitor
# http://localhost:5001
# Look at component latencies

# 2. Check system resources
top
# Look for high CPU/memory usage

# 3. Check component timings in logs
grep -E "took|duration|latency" logs/bob.log | tail -20

# 4. Check GPU usage (if using vision with GPU)
nvidia-smi  # If NVIDIA GPU
# or
python test_gpu_status.py

# 5. Check network latency
ping api.openai.com
ping api.elevenlabs.io

Common Causes:

❌ Slow network connection
❌ GPU not being used (CPU fallback)
❌ Resource-heavy operations
❌ LLM model too large
❌ Multiple heavy components running

Performance Targets:

Wake word: < 500ms
STT: < 3s
LLM: < 3s
TTS: < 2s
Vision: 5-10 FPS
Total: < 10s end-to-end

Solutions:

# Use faster LLM model
# Edit .env: BOBTHESKULL_LLM_MODEL=gpt-3.5-turbo

# Enable GPU for vision (if available)
# Edit .env: BOBTHESKULL_VISION_ENABLE_GPU=true

# Reduce vision frame rate
# Edit .env: BOBTHESKULL_VISION_MAX_FRAMES_PER_SECOND=5

# Check network
# Test on local network if possible
# Verify good WiFi signal (Pi)

MQTT Event Bus Monitoring

Using mqtt_monitor.py

# Start MQTT monitor
python mqtt_monitor.py

# Output shows real-time events:
# 2024-12-09 15:30:45 | WakeWordDetectedEvent | phrase=wake up bob
# 2024-12-09 15:30:46 | StateTransitionEvent | from=IDLE to=WAKE_LISTENING
# 2024-12-09 15:30:47 | GreetingEvent | greeting=Yes wizard?
# ...

Event Flow Analysis

Normal conversation flow:

1. WakeWordDetectedEvent (phrase=wake up bob)
2. StateTransitionEvent (IDLE -> WAKE_LISTENING)
3. GreetingEvent (greeting=Yes wizard?)
4. StateTransitionEvent (WAKE_LISTENING -> GREETING)
5. SpeechRecognizedEvent (transcript=What time is it?)
6. StateTransitionEvent (GREETING -> PROCESSING)
7. LLMRequestEvent (input=What time is it?)
8. LLMResponseEvent (response=It's 3:30 PM)
9. StateTransitionEvent (PROCESSING -> SPEAKING)
10. TTSEvent (text=It's 3:30 PM)
11. SpeakingCompleteEvent
12. StateTransitionEvent (SPEAKING -> WAKE_LISTENING)

Problem patterns:

Missing events:

WakeWordDetectedEvent
(no StateTransitionEvent) ← Problem: State machine not responding

Repeated events:

WakeWordDetectedEvent
WakeWordDetectedEvent  ← Problem: Audio feedback loop
WakeWordDetectedEvent

Timeout sequence:

StateTransitionEvent (-> LISTENING)
TimeoutEvent (state=LISTENING)  ← Problem: No speech detected
StateTransitionEvent (LISTENING -> ERROR)

State Machine Monitoring

Valid State Transitions

IDLE ──wake_word──> WAKE_LISTENING ──greeting_complete──> GREETING
GREETING ──speech_detected──> LISTENING
LISTENING ──speech_recognized──> PROCESSING
PROCESSING ──llm_response──> SPEAKING
SPEAKING ──speaking_complete──> WAKE_LISTENING
[any] ──error──> ERROR
ERROR ──timeout──> IDLE

State Duration Expectations

State	Normal Duration	Max Timeout
IDLE	Indefinite	None
WAKE_LISTENING	< 1s (greeting)	5s
GREETING	1-2s (play greeting)	10s
LISTENING	< 5s (speech)	30s
PROCESSING	2-5s (LLM)	30s
SPEAKING	2-10s (TTS+playback)	60s
ERROR	< 5s (recovery)	10s

Monitoring State Health

# Check current state and duration
curl -s http://localhost:5001/api/state | jq '{state: .current_state, duration: .time_in_state}'

# If duration > expected max timeout → investigate

# Check recent transitions
curl -s http://localhost:5001/api/events | jq '.[] | select(.type == "StateTransitionEvent") | {from: .from_state, to: .to_state, time: .timestamp}'

# Verify transitions are valid
# Compare to state machine diagram

Remote Pi Monitoring

SSH Access

# Connect to Pi
ssh knarl@192.168.1.44
# Password: peacock7

# Or use plink (Windows)
plink -pw peacock7 knarl@192.168.1.44

Remote Commands

# Check if Bob is running
ssh knarl@192.168.1.44 "ps aux | grep BobTheSkull"

# View recent logs
ssh knarl@192.168.1.44 "tail -50 /home/knarl/BobTheSkull5/logs/bob.log"

# Check errors
ssh knarl@192.168.1.44 "grep -i error /home/knarl/BobTheSkull5/logs/bob.log | tail -10"

# Restart Bob
ssh knarl@192.168.1.44 "pkill -f BobTheSkull && cd /home/knarl/BobTheSkull5 && nohup python BobTheSkull.py > bob.log 2>&1 &"

Web Monitor from PC

http://192.168.1.44:5001

Verify Pi web monitor is accessible:

# From PC
curl -s http://192.168.1.44:5001/api/health

Performance Metrics

Key Metrics to Track

1. Component Latencies

Wake word detection: < 500ms
STT processing: < 3s
LLM response: < 3s
TTS synthesis: < 2s

2. Vision Performance

Frame rate: 5-10 FPS
Detection rate: Varies by scene
GPU utilization: 20-40% (if enabled)

3. Event Bus

Event publish rate: ~5-20 events/second
Event processing latency: < 100ms
Queue depth: < 10 events

4. State Machine

Transition latency: < 100ms
State duration: Within expected ranges
Timeout frequency: < 1% of transitions

Collecting Metrics

Via web monitor:

http://localhost:5001
# Shows real-time metrics in dashboard

Via logs:

# Extract latency measurements
grep "took" logs/bob.log | awk '{print $NF}' | sort -n

# Count events per minute
grep "Publishing event" logs/bob.log | cut -d' ' -f1-2 | uniq -c

# Average FPS
grep "FPS:" logs/bob.log | awk '{sum+=$NF; count++} END {print sum/count}'

Health Check Procedures

Startup Health Check

After starting Bob, verify:

# 1. All components initialized
grep "Initializing" logs/bob.log
# Should see: Wake Word, STT, LLM, TTS, Vision (if enabled), Eyes

# 2. No initialization errors
grep -i "initialization.*error" logs/bob.log
# Should be empty

# 3. State machine started
grep "State machine started" logs/bob.log

# 4. Current state is IDLE
curl -s http://localhost:5001/api/state | jq .current_state
# Should show: "IDLE"

# 5. Web monitor accessible
curl -s http://localhost:5001/api/health
# Should return: {"status": "ok"}

Periodic Health Check

Run every few hours during development:

# Check error count
error_count=$(grep -i error logs/bob.log | wc -l)
echo "Errors: $error_count"
# Goal: < 10 errors per hour

# Check state machine is cycling
tail -100 logs/bob.log | grep "State transition" | wc -l
# Should be > 0 if actively used

# Check component status
curl -s http://localhost:5001/api/health | jq

Common Error Messages

Error: "Device not found"

Full error: PyAudio error: Device X not found

Cause: Audio device index invalid or device disconnected

Fix:

python list_audio_devices.py
# Update .env with correct device index

Error: "API key invalid"

Full error: OpenAI API Error: Invalid API key

Cause: API key expired, revoked, or incorrect

Fix:

# Verify API key format
grep OPENAI_API_KEY .env
# Should start with sk-

# Test key at platform.openai.com
# Generate new key if needed

Error: "Camera not accessible"

Full error: Cannot open camera /dev/video0

Cause: Camera in use, disconnected, or permissions issue

Fix:

# Check camera exists
ls -l /dev/video*

# Check permissions
sudo chmod 666 /dev/video0

# Kill processes using camera
sudo lsof /dev/video0
sudo kill <PID>

Error: "MQTT connection refused"

Full error: MQTT broker connection refused on localhost:1883

Cause: MQTT broker not running

Fix:

# Check if mosquitto is running
systemctl status mosquitto

# Start mosquitto
sudo systemctl start mosquitto

# Or use embedded broker (if configured)

Pro Tips

Keep web monitor open - Always have http://localhost:5001 open in browser during development
Use screen on Pi - Run Bob in screen session to prevent SSH disconnects from killing it
Tail logs in separate terminal - Keep tail -f logs/bob.log running in another terminal
Grep with color - Use grep --color=always to highlight matches

Create monitoring aliases - Add to ~/.bashrc:

alias bob-log='tail -f logs/bob.log'
alias bob-errors='grep -i error logs/bob.log | tail -20'
alias bob-state='curl -s http://localhost:5001/api/state | jq'

Use jq for JSON - Install jq for pretty-printing API responses
Monitor network - Use nethogs or iftop to see network usage
Check timestamps - Always check event timestamps to understand sequence
Compare working vs broken - Keep logs from working state to compare
Test incrementally - Don't change multiple things at once

Integration with Other Skills

Works well with:

pi-deployment - Monitor after deployment to verify success
audio-injection-testing - Monitor events during automated testing
config-pattern - Verify config changes have desired effect

Time Savings

Without skill:

15-20 minutes figuring out where to look
10-15 minutes trial-and-error debugging
Missed correlations between components

With skill:

3-5 minutes following documented scenario
Quick diagnosis with known patterns
Clear troubleshooting checklists

Estimated time savings: 3-4x faster issue resolution

References

Monitoring Tools:

web/monitor_server.py - Web dashboard
mqtt_monitor.py - MQTT event viewer
test_web_monitor.py - Monitor testing

Log Files:

logs/bob.log - Main application log
Check .env for LOG_LEVEL setting

API Endpoints:

GET /api/state - Current state
GET /api/events - Recent events
GET /api/health - Component health

Related Documentation:

requirements/LoggingandMonitoringRequirements.md
CLAUDE.md - Project overview

Install Skill

SKILL.md

Monitoring & Debugging Skill

When to Use This Skill

Quick Reference

Monitoring Tools

Quick Diagnostics

Web Monitor Dashboard

Accessing the Dashboard

Dashboard Components

Common Dashboard Patterns

Log Analysis

Log File Locations

Essential Log Patterns

Log Level Interpretation

Common Debugging Scenarios

Scenario 1: Wake Word Not Detecting

Scenario 2: State Machine Stuck

Scenario 3: LLM Not Responding

Scenario 4: Vision Not Working

Scenario 5: Audio Output Not Working

Scenario 6: High Latency / Slow Response

MQTT Event Bus Monitoring

Using mqtt_monitor.py

Event Flow Analysis

State Machine Monitoring

Valid State Transitions

State Duration Expectations

Monitoring State Health

Remote Pi Monitoring

SSH Access

Remote Commands

Web Monitor from PC

Performance Metrics

Key Metrics to Track

Collecting Metrics

Health Check Procedures

Startup Health Check

Periodic Health Check

Common Error Messages

Error: "Device not found"

Error: "API key invalid"

Error: "Camera not accessible"

Error: "MQTT connection refused"

Pro Tips

Integration with Other Skills

Time Savings

References