| name | aws-logs-query |
| description | Query AWS CloudWatch logs for staging and production environments. Use for debugging errors, investigating incidents, or monitoring application behavior. PRIMARY - aws logs tail "/ecs/codel-staging" --follow --format short | grep -iE "error|exception" |
AWS Logs Query Skill
Query AWS CloudWatch logs for Codel staging and production environments using the AWS CLI.
When to Use
Use this skill when you need to:
- Debug production or staging errors
- Investigate Evolution provider issues
- Monitor application behavior in real-time
- Search for specific error patterns or events
- Analyze log patterns over time windows
- Track down exceptions or failures
Prerequisites
- AWS CLI installed (
brew install awscliorpip install awscli) - AWS credentials available (via profile, environment, or ECS task role)
- Access to
/ecs/codel-stagingand/ecs/codel-prodlog groups
Configuration
Local development: Set AWS_PROFILE environment variable to your profile name:
export AWS_PROFILE=codel
ECS containers: No configuration needed — uses the task role automatically.
Profile Handling
IMPORTANT: When running AWS CLI commands:
- If
AWS_PROFILEenv var is set → add--profile $AWS_PROFILEto commands - If
AWS_PROFILEis not set → omit--profileflag (uses default credential chain)
# Check if profile is set and build the profile flag
PROFILE_FLAG=""
if [ -n "$AWS_PROFILE" ]; then
PROFILE_FLAG="--profile $AWS_PROFILE"
fi
# Then use in commands:
aws logs tail "/ecs/codel-staging" --follow $PROFILE_FLAG --format short
For simplicity, all examples below omit --profile. Add --profile $AWS_PROFILE if running locally with a named profile.
⚡ Choosing Your Tool
CRITICAL: Use the right tool for the job!
Use CloudWatch Insights for:
- ✅ Historical searches (> 1 hour ago)
- ✅ Multi-day searches (past week, past month)
- ✅ Complex pattern matching (multiple filters, aggregations)
- ✅ Counting/statistics (how many errors yesterday?)
- ✅ ANY search beyond a few hours
Use tail for:
- ✅ Real-time monitoring (watching logs as they happen)
- ✅ Recent logs (last 15-60 minutes)
- ✅ Simple grep patterns (quick error checks)
❌ DON'T Use tail for:
- ❌ Multi-day searches (use CloudWatch Insights)
- ❌ Historical analysis (use CloudWatch Insights)
- ❌ Large time windows like
--since 7d(extremely slow, will timeout)
CloudWatch Insights (PRIMARY for Historical Searches)
For ANY search beyond the last hour, use CloudWatch Insights instead of tail.
Basic Search Pattern
# Calculate timestamps
END_TIME=$(date -u +%s)
START_TIME=$((END_TIME - 604800)) # 7 days in seconds
# Start query
QUERY_ID=$(aws logs start-query \
--log-group-name "/ecs/codel-prod" \
\
--start-time "$START_TIME" \
--end-time "$END_TIME" \
--query-string 'fields @timestamp, @message | filter @message like /your_pattern_here/ | sort @timestamp desc | limit 100' \
--output text --query 'queryId')
# Wait for results
sleep 5
# Get results
aws logs get-query-results --query-id "$QUERY_ID" --output json
Real-World Example: Search Past Week for Specific Error
# Search for Linq validation errors in the past 7 days
END_TIME=$(date -u +%s)
START_TIME=$((END_TIME - 604800))
QUERY='fields @timestamp, @message
| filter @message like /bool_type.*is_me.*Input should be a valid boolean/
| sort @timestamp desc
| limit 100'
QUERY_ID=$(aws logs start-query \
--log-group-name "/ecs/codel-prod" \
\
--start-time "$START_TIME" \
--end-time "$END_TIME" \
--query-string "$QUERY" \
--output text --query 'queryId')
sleep 5
aws logs get-query-results --query-id "$QUERY_ID" --output json
Common CloudWatch Insights Queries
# Count errors by type in past 24 hours
QUERY='fields @message
| filter @message like /error|Error|ERROR/
| stats count() by @message
| sort count desc
| limit 20'
# Find all Pydantic validation errors
QUERY='fields @timestamp, @message
| filter @message like /ValidationError/
| sort @timestamp desc
| limit 50'
# Search for specific phone number in logs
QUERY='fields @timestamp, @message
| filter @message like /\+16508997366/
| sort @timestamp desc
| limit 100'
# Get errors from specific time range
QUERY='fields @timestamp, @message
| filter @message like /error/
and @timestamp >= "2025-11-07T00:00:00"
and @timestamp <= "2025-11-08T00:00:00"
| sort @timestamp desc'
Time Range Calculations
# Common time windows
END_TIME=$(date -u +%s)
HOUR_AGO=$((END_TIME - 3600))
DAY_AGO=$((END_TIME - 86400))
WEEK_AGO=$((END_TIME - 604800))
MONTH_AGO=$((END_TIME - 2592000))
# Use in queries
--start-time "$WEEK_AGO" --end-time "$END_TIME"
Tail for Recent Logs (< 1 hour)
Use tail for real-time monitoring and quick checks of recent logs.
Watch Logs in Real-Time
# Watch staging logs as they happen
aws logs tail "/ecs/codel-staging" --follow --format short
# Watch production logs
aws logs tail "/ecs/codel-prod" --follow --format short
Quick Recent Searches
# Last 15 minutes only
aws logs tail "/ecs/codel-staging" --since 15m --format short
# Find errors in last 30 min
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -iE "error|exception|failed"
# Get full error context (10 lines before/after)
aws logs tail "/ecs/codel-staging" --since 15m --format short | grep -iE -B 10 -A 10 "ValidationError"
Available Log Groups
Main Application Logs
Staging: /ecs/codel-staging
Production: /ecs/codel-prod
- Services: api, worker, voice, admin-api
- Use for: General application debugging, message processing, webhooks
Evolution API Logs (WhatsApp Provider)
Staging: /ecs/codel-staging-evolution
Production: /ecs/codel-prod-evolution
- Use for: Evolution API issues, WhatsApp connectivity, instance management
- When to use: Evolution validation errors, disconnection issues, webhook problems
Langfuse Logs (LLM Observability)
Web:
- Staging:
/ecs/codel-staging-langfuse-web - Production:
/ecs/codel-prod-langfuse-web
Worker:
- Staging:
/ecs/codel-staging-langfuse-worker - Production:
/ecs/codel-prod-langfuse-worker
ClickHouse:
- Staging:
/ecs/codel-staging-langfuse-clickhouse - Production:
/ecs/codel-prod-langfuse-clickhouse
Metabase Logs (Analytics)
Staging: /ecs/codel-staging-metabase
Production: /ecs/codel-prod-metabase
Quick Reference
# Main application (default for most debugging)
STAGING_MAIN="/ecs/codel-staging"
PROD_MAIN="/ecs/codel-prod"
# Evolution API (WhatsApp provider)
STAGING_EVOLUTION="/ecs/codel-staging-evolution"
PROD_EVOLUTION="/ecs/codel-prod-evolution"
# Langfuse
STAGING_LANGFUSE_WEB="/ecs/codel-staging-langfuse-web"
PROD_LANGFUSE_WEB="/ecs/codel-prod-langfuse-web"
# Metabase
STAGING_METABASE="/ecs/codel-staging-metabase"
PROD_METABASE="/ecs/codel-prod-metabase"
Common Use Cases
Debug Evolution Errors
Main app logs (where Evolution webhooks are processed):
# Find recent Evolution validation errors in main app
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -i "evolution" | grep -i "error"
# Get full traceback for Evolution issues
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -B 20 -A 5 "EvolutionInstance"
# Watch Evolution webhooks being processed
aws logs tail "/ecs/codel-staging" --follow --format short | grep -i "evolution.*webhook"
Evolution API logs (Evolution service itself):
# Watch Evolution API directly
aws logs tail "/ecs/codel-staging-evolution" --follow --format short
# Find Evolution API errors
aws logs tail "/ecs/codel-prod-evolution" --since 1h --format short | grep -iE "error|exception|fail"
# Check Evolution instance disconnections
aws logs tail "/ecs/codel-prod-evolution" --since 30m --format short | grep -i "disconnect"
# Monitor Evolution health checks
aws logs tail "/ecs/codel-staging-evolution" --since 15m --format short | grep -i "health"
Find Specific Error Types
# Pydantic validation errors
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -i "ValidationError"
# Database errors
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE "sqlalchemy|database|postgres"
# OpenAI/LLM errors
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE "openai|anthropic|llm"
# Worker job failures
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE "job.*failed|exception raised while executing"
Monitor Specific Services
# API logs only (look for api/ prefix in log stream)
aws logs tail "/ecs/codel-staging" --follow --format short | grep "api/api/"
# Worker logs only
aws logs tail "/ecs/codel-staging" --follow --format short | grep "worker/worker/"
# Voice service logs
aws logs tail "/ecs/codel-staging" --follow --format short | grep "voice/voice/"
Time Windows
The --since flag accepts various formats:
5m- Last 5 minutes15m- Last 15 minutes1h- Last hour6h- Last 6 hours1d- Last day2h30m- Last 2 hours 30 minutes
Log Stream Patterns
Each ECS task creates its own log stream with this pattern:
{service}/{service}/{task-id}
Examples:
api/api/4ffc8e0fd7c54c76a00873409a1a01c4
worker/worker/e9bdd10c0cd3424283c16ffa24fca756
voice/voice/7c8088a63fbc442091d7ebbb04235523
admin-api/admin-api/348f846767d34f90913d145b9533eda8
List Active Log Streams
# See which tasks are currently logging
aws logs describe-log-streams \
--log-group-name "/ecs/codel-staging" \
\
--order-by LastEventTime \
--descending \
--max-items 10
Debugging Workflows
When Tests Fail
# 1. Check recent errors
aws logs tail "/ecs/codel-staging" --since 15m --format short | grep -iE "error|exception|failed"
# 2. Look for specific test failures
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -i "test"
# 3. Check worker job processing
aws logs tail "/ecs/codel-staging" --since 15m --format short | grep -i "worker"
When Evolution Errors Occur
# 1. Find the error message
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -i "evolution.*error"
# 2. Get full context (20 lines before/after)
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE -B 20 -A 10 "EvolutionInstance|evolution.*validation"
# 3. Check webhook processing
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -i "evolution.*webhook"
When Messages Aren't Sending
# 1. Check message sending errors
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -iE "send.*message|messaging"
# 2. Look for provider errors
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -iE "evolution|linq|sendblue"
# 3. Check for API errors
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -E "(400|401|403|404|500|502|503)"
Common Error Patterns
| Error Pattern | What It Means | How to Search |
|---|---|---|
ValidationError: 2 validation errors for EvolutionInstance |
Evolution API response doesn't match Pydantic model | grep -i "ValidationError.*Evolution" |
disconnectionReasonCode.*should be a valid string |
Evolution API returned int instead of string | grep "disconnectionReasonCode" |
failed to send, dropping.*traces |
Datadog agent not reachable (not critical) | grep "failed to send.*traces" |
Database session error |
SQLAlchemy session issue, usually followed by real error | grep -B 5 -A 10 "Database session error" |
Job.*exception raised while executing |
RQ worker job failed | grep "exception raised while executing" |
Output Formats
Short Format (Recommended)
--format short
# Output: timestamp message
# Example: 2025-11-14T01:34:06 [error] Database session error
Detailed Format
--format detailed
# Output: timestamp log-stream message
# More verbose, useful for debugging specific tasks
Filtering and Post-Processing
Save Logs for Analysis
# Save last hour of logs to file
aws logs tail "/ecs/codel-staging" --since 1h --format short > staging-logs.txt
# Save only errors
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE "error|exception" > staging-errors.txt
Count Error Occurrences
# Count how many validation errors
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -c "ValidationError"
# Count errors by type
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -iE "error|exception" | sort | uniq -c | sort -rn
Follow Specific Flow
# Track a specific webhook through the system
aws logs tail "/ecs/codel-staging" --follow --format short | grep "webhook_id_here"
# Follow a user's messages
aws logs tail "/ecs/codel-staging" --follow --format short | grep "person_id.*123"
Integration with Claude Code
When using this skill, choose the right tool based on time range:
For Historical Searches (> 1 hour ago)
- Use CloudWatch Insights (not tail!)
- Calculate time range using epoch timestamps
- Build query with proper filters
- Report findings with timestamps and counts
Example workflow for past week search:
# 1. Calculate time range
END_TIME=$(date -u +%s)
START_TIME=$((END_TIME - 604800)) # 7 days
# 2. Build and run query
QUERY='fields @timestamp, @message | filter @message like /ValidationError/ | sort @timestamp desc | limit 100'
QUERY_ID=$(aws logs start-query --log-group-name "/ecs/codel-prod" \
--start-time "$START_TIME" --end-time "$END_TIME" --query-string "$QUERY" \
--output text --query 'queryId')
# 3. Wait and get results
sleep 5
aws logs get-query-results --query-id "$QUERY_ID" --output json
# 4. Report: "Found 47 validation errors in the past 7 days, first occurred on Nov 7..."
For Recent Logs (< 1 hour ago)
- Use tail for quick checks
- Pipe to grep for pattern matching
- Use -B/-A for context
- Report findings concisely
Example workflow for recent errors:
# 1. Check last 15 minutes
aws logs tail "/ecs/codel-staging" --since 15m --format short | grep -i "error"
# 2. Get full context
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -B 20 -A 5 "EvolutionInstance"
# 3. Report: "Found 2 Evolution errors in the last 15 minutes..."
Troubleshooting
"The specified log group does not exist"
- Verify log group name (should be
/ecs/codel-stagingor/ecs/codel-prod) - Check you're using ``
- Confirm AWS credentials are configured
"Invalid --since value"
- Use format like
15m,1h,30m - Don't use spaces:
15 m❌15m✅
No output when tailing
- Try expanding time window:
--since 1hor--since 6h - Check log streams exist:
aws logs describe-log-streams --log-group-name "/ecs/codel-staging" --max-items 5 - Verify services are running and generating logs
Query times out
- Reduce time window (try
--since 1hinstead of--since 1d) - Use more specific filters
- Consider CloudWatch Insights for complex queries
Quick Reference Card
# Watch logs live
aws logs tail "/ecs/codel-staging" --follow --format short
# Last 15 minutes
aws logs tail "/ecs/codel-staging" --since 15m --format short
# Find errors
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -iE "error|exception"
# Evolution issues
aws logs tail "/ecs/codel-staging" --since 1h --format short | grep -i "evolution" | grep -i "error"
# Get context around errors
aws logs tail "/ecs/codel-staging" --since 30m --format short | grep -B 20 -A 10 "ValidationError"
# Production (same commands, different log group)
aws logs tail "/ecs/codel-prod" --since 15m --format short
Best Practices
✅ Do:
- Start with staging when debugging
- Use
--sinceto limit time window (faster queries) - Pipe to
grepfor pattern matching - Use
-Band-Afor context around matches - Save interesting logs to files for analysis
❌ Don't:
- Query production unless investigating live issues
- Use very large time windows (> 6h) without filters
- Run queries without ``
- Forget to use
--format shortfor readable output