Claude Code Plugins

Community-maintained marketplace

Feedback

Debug AWS resource issues, check Lambda logs, and monitor deployed services. Use when investigating production issues, checking CloudWatch logs, or debugging deployment failures.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name aws-monitoring
description Debug AWS resource issues, check Lambda logs, and monitor deployed services. Use when investigating production issues, checking CloudWatch logs, or debugging deployment failures.
allowed-tools Bash, Read, Grep

AWS Monitoring Skill

This skill helps you monitor and debug AWS resources for the SG Cars Trends platform.

When to Use This Skill

  • Investigating production errors
  • Checking Lambda function logs
  • Monitoring API performance
  • Debugging deployment failures
  • Analyzing CloudWatch metrics
  • Setting up alarms
  • Troubleshooting resource issues

Monitoring Tools

SST Console

SST provides a built-in console for monitoring:

# Open SST console for specific stage
npx sst console --stage production
npx sst console --stage staging
npx sst console --stage dev

Features:

  • Real-time Lambda logs
  • Function invocations
  • Error tracking
  • Resource overview
  • Environment variables

CloudWatch Logs

Access Lambda logs via CloudWatch:

# View logs using SST
npx sst logs --stage production

# View specific function logs
npx sst logs --stage production --function api

# Tail logs in real-time
npx sst logs --stage production --function api --tail

# Filter logs
npx sst logs --stage production --function api --filter "ERROR"

# Show logs from specific time
npx sst logs --stage production --function api --since 1h
npx sst logs --stage production --function api --since "2024-01-15 10:00"

AWS CLI

Use AWS CLI for advanced log queries:

# List log groups
aws logs describe-log-groups \
  --log-group-name-prefix "/aws/lambda/sgcarstrends"

# Get recent log streams
aws logs describe-log-streams \
  --log-group-name "/aws/lambda/sgcarstrends-api-production" \
  --order-by LastEventTime \
  --descending \
  --max-items 5

# Tail logs
aws logs tail "/aws/lambda/sgcarstrends-api-production" --follow

# Filter logs
aws logs filter-log-events \
  --log-group-name "/aws/lambda/sgcarstrends-api-production" \
  --filter-pattern "ERROR" \
  --start-time $(date -u -d '1 hour ago' +%s)000

# Get logs for specific request
aws logs filter-log-events \
  --log-group-name "/aws/lambda/sgcarstrends-api-production" \
  --filter-pattern "request-id-here"

CloudWatch Metrics

Lambda Metrics

# Get Lambda invocations
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=sgcarstrends-api-production \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Errors \
  --dimensions Name=FunctionName,Value=sgcarstrends-api-production \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get duration
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=sgcarstrends-api-production \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average,Maximum

API Gateway Metrics

# Get API requests
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name Count \
  --dimensions Name=ApiName,Value=sgcarstrends-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get 4XX errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name 4XXError \
  --dimensions Name=ApiName,Value=sgcarstrends-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get latency
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name Latency \
  --dimensions Name=ApiName,Value=sgcarstrends-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average,Maximum,p99

CloudWatch Alarms

Creating Alarms

// infra/alarms.ts
import { StackContext, use } from "sst/constructs";
import * as cloudwatch from "aws-cdk-lib/aws-cloudwatch";
import * as sns from "aws-cdk-lib/aws-sns";
import * as subscriptions from "aws-cdk-lib/aws-sns-subscriptions";
import { API } from "./api";

export function Alarms({ stack, app }: StackContext) {
  const { api } = use(API);

  // Only create alarms for production
  if (app.stage !== "production") {
    return;
  }

  // SNS topic for alarms
  const alarmTopic = new sns.Topic(stack, "AlarmTopic");

  // Add email subscription
  alarmTopic.addSubscription(
    new subscriptions.EmailSubscription("alerts@sgcarstrends.com")
  );

  // High error rate alarm
  new cloudwatch.Alarm(stack, "ApiHighErrorRate", {
    metric: api.metricErrors(),
    threshold: 10,
    evaluationPeriods: 2,
    datapointsToAlarm: 2,
    alarmDescription: "API has high error rate",
    treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
  }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic));

  // High duration alarm
  new cloudwatch.Alarm(stack, "ApiHighDuration", {
    metric: api.metricDuration(),
    threshold: 5000, // 5 seconds
    evaluationPeriods: 2,
    datapointsToAlarm: 2,
    alarmDescription: "API response time is high",
    treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
  }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic));

  // Throttle alarm
  new cloudwatch.Alarm(stack, "ApiThrottled", {
    metric: api.metricThrottles(),
    threshold: 1,
    evaluationPeriods: 1,
    alarmDescription: "API is being throttled",
    treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
  }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic));
}

Add to SST config:

// infra/sst.config.ts
import { Alarms } from "./alarms";

export default {
  stacks(app) {
    app
      .stack(DNS)
      .stack(API)
      .stack(Web)
      .stack(Alarms); // Add alarms stack
  },
} satisfies SSTConfig;

Managing Alarms via CLI

# List alarms
aws cloudwatch describe-alarms

# Get alarm state
aws cloudwatch describe-alarms \
  --alarm-names "sgcarstrends-ApiHighErrorRate"

# Disable alarm
aws cloudwatch disable-alarm-actions \
  --alarm-names "sgcarstrends-ApiHighErrorRate"

# Enable alarm
aws cloudwatch enable-alarm-actions \
  --alarm-names "sgcarstrends-ApiHighErrorRate"

# Delete alarm
aws cloudwatch delete-alarms \
  --alarm-names "sgcarstrends-ApiHighErrorRate"

CloudWatch Insights

Querying Logs

# Start query
aws logs start-query \
  --log-group-name "/aws/lambda/sgcarstrends-api-production" \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s) \
  --query-string 'fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20'

# Get query results
aws logs get-query-results --query-id <query-id>

Common Queries

Find errors:

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

API performance:

fields @timestamp, @duration
| stats avg(@duration), max(@duration), min(@duration)

Count errors by type:

fields @message
| filter @message like /ERROR/
| parse @message /(?<errorType>\w+Error)/
| stats count() by errorType

Slow requests:

fields @timestamp, @duration, @requestId
| filter @duration > 1000
| sort @duration desc
| limit 20

Request rate:

fields @timestamp
| stats count() by bin(5m)

X-Ray Tracing

Enable X-Ray

// infra/api.ts
import { StackContext, Function } from "sst/constructs";
import * as lambda from "aws-cdk-lib/aws-lambda";

export function API({ stack }: StackContext) {
  const api = new Function(stack, "api", {
    handler: "apps/api/src/index.handler",
    tracing: lambda.Tracing.ACTIVE, // Enable X-Ray
  });

  return { api };
}

Instrument Code

// apps/api/src/index.ts
import { captureAWSv3Client } from "aws-xray-sdk-core";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";

// Wrap AWS SDK clients
const client = captureAWSv3Client(new DynamoDBClient({}));

View Traces

# Get service graph
aws xray get-service-graph \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s)

# Get trace summaries
aws xray get-trace-summaries \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s)

# Get trace details
aws xray batch-get-traces --trace-ids <trace-id>

Resource Monitoring

Lambda Functions

# List functions
aws lambda list-functions --query 'Functions[?starts_with(FunctionName, `sgcarstrends`)].FunctionName'

# Get function config
aws lambda get-function-configuration \
  --function-name sgcarstrends-api-production

# Get function code location
aws lambda get-function \
  --function-name sgcarstrends-api-production

# Invoke function
aws lambda invoke \
  --function-name sgcarstrends-api-production \
  --payload '{"path": "/health"}' \
  response.json

cat response.json

CloudFront Distributions

# List distributions
aws cloudfront list-distributions \
  --query 'DistributionList.Items[*].[Id,DomainName,Status]' \
  --output table

# Get distribution config
aws cloudfront get-distribution-config --id <distribution-id>

# Create invalidation (cache clear)
aws cloudfront create-invalidation \
  --distribution-id <distribution-id> \
  --paths "/*"

# List invalidations
aws cloudfront list-invalidations --distribution-id <distribution-id>

S3 Buckets

# List buckets
aws s3 ls

# Get bucket size
aws s3 ls s3://bucket-name --recursive --summarize | grep "Total Size"

# Monitor bucket metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 \
  --metric-name BucketSizeBytes \
  --dimensions Name=BucketName,Value=bucket-name Name=StorageType,Value=StandardStorage \
  --start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average

Cost Monitoring

Cost Explorer

# Get cost and usage
aws ce get-cost-and-usage \
  --time-period Start=$(date -u -d '1 month ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=SERVICE

# Get cost by tag
aws ce get-cost-and-usage \
  --time-period Start=$(date -u -d '1 month ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=TAG,Key=Environment

Budget Alerts

Create budget in AWS Console or via CLI:

# Create budget
aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json

Debugging Production Issues

1. Check Recent Deployments

# Get stack events
aws cloudformation describe-stack-events \
  --stack-name sgcarstrends-api-production \
  --max-items 50

# Get deployment status
npx sst stacks info API --stage production

2. Check Logs for Errors

# Get recent errors
npx sst logs --stage production --function api --filter "ERROR" --since 1h

# Or use AWS CLI
aws logs tail "/aws/lambda/sgcarstrends-api-production" \
  --follow \
  --filter-pattern "ERROR"

3. Check Metrics

# Check invocations and errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=sgcarstrends-api-production \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

4. Test Endpoint

# Test API directly
curl -I https://api.sgcarstrends.com/health

# Test with verbose output
curl -v https://api.sgcarstrends.com/health

5. Check Resource Limits

# Check Lambda quotas
aws service-quotas get-service-quota \
  --service-code lambda \
  --quota-code L-B99A9384  # Concurrent executions

# Check API Gateway quotas
aws service-quotas list-service-quotas \
  --service-code apigateway

Common Issues

High Latency

Investigation:

  1. Check Lambda duration metrics
  2. Review CloudWatch Insights for slow queries
  3. Check database connection pool
  4. Review API response times

Solutions:

  • Increase Lambda memory
  • Optimize database queries
  • Add caching
  • Use connection pooling

High Error Rate

Investigation:

  1. Check error logs
  2. Review error types
  3. Check external service status
  4. Verify environment variables

Solutions:

  • Fix application bugs
  • Add error handling
  • Retry failed requests
  • Check API rate limits

Cold Starts

Investigation:

  1. Check init duration
  2. Review package size
  3. Check provisioned concurrency

Solutions:

  • Enable provisioned concurrency
  • Reduce bundle size
  • Use ARM architecture
  • Optimize imports

Monitoring Scripts

Health Check Script

#!/bin/bash
# scripts/health-check.sh

STAGE=${1:-production}
API_URL="https://api${STAGE:+.$STAGE}.sgcarstrends.com"

echo "Checking health of $STAGE environment..."

# Check API
API_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $API_URL/health)

if [ $API_STATUS -eq 200 ]; then
  echo "✓ API is healthy"
else
  echo "✗ API is down (status: $API_STATUS)"
  exit 1
fi

# Check Web
WEB_URL="https://${STAGE:+$STAGE.}sgcarstrends.com"
WEB_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $WEB_URL)

if [ $WEB_STATUS -eq 200 ]; then
  echo "✓ Web is healthy"
else
  echo "✗ Web is down (status: $WEB_STATUS)"
  exit 1
fi

echo "All services are healthy!"

Run:

chmod +x scripts/health-check.sh
./scripts/health-check.sh production

Log Analysis Script

#!/bin/bash
# scripts/analyze-logs.sh

STAGE=${1:-production}
LOG_GROUP="/aws/lambda/sgcarstrends-api-$STAGE"

echo "Analyzing logs for $STAGE..."

# Count errors in last hour
ERROR_COUNT=$(aws logs filter-log-events \
  --log-group-name $LOG_GROUP \
  --filter-pattern "ERROR" \
  --start-time $(date -u -d '1 hour ago' +%s)000 \
  --query 'events[*].message' \
  --output text | wc -l)

echo "Errors in last hour: $ERROR_COUNT"

# Get top errors
echo -e "\nTop error types:"
aws logs filter-log-events \
  --log-group-name $LOG_GROUP \
  --filter-pattern "ERROR" \
  --start-time $(date -u -d '1 hour ago' +%s)000 \
  --query 'events[*].message' \
  --output text | \
  grep -oE '\w+Error' | \
  sort | uniq -c | sort -rn | head -5

References

Best Practices

  1. Log Levels: Use appropriate log levels (DEBUG, INFO, WARN, ERROR)
  2. Structured Logging: Use JSON format for easier parsing
  3. Correlation IDs: Track requests across services
  4. Alarms: Set up alarms for critical metrics
  5. Dashboards: Create CloudWatch dashboards for key metrics
  6. Cost Monitoring: Track AWS costs regularly
  7. Regular Reviews: Review logs and metrics weekly
  8. Retention: Set appropriate log retention (7-30 days)