Claude Code Plugins

Community-maintained marketplace

Feedback

Implement monitoring, alerting, and observability with CloudWatch

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name aws-cloudwatch
description Implement monitoring, alerting, and observability with CloudWatch
sasmp_version 1.3.0
bonded_agent 08-aws-devops
bond_type SECONDARY_BOND

AWS CloudWatch Skill

Set up comprehensive monitoring and alerting for AWS resources.

Quick Reference

Attribute Value
AWS Service CloudWatch
Complexity Medium
Est. Time 15-30 min
Prerequisites Resources to monitor

Parameters

Required

Parameter Type Description Validation
namespace string Metric namespace AWS/* or custom
metric_name string Metric name Valid metric
resource_id string Resource identifier Valid ARN or ID

Optional

Parameter Type Default Description
period int 300 Evaluation period (seconds)
statistic string Average Average, Sum, Min, Max, p99
threshold float varies Alert threshold
evaluation_periods int 3 Consecutive periods

Essential Alarms

EC2 Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80
  period: 300
  evaluation_periods: 3

- name: StatusCheckFailed
  metric: StatusCheckFailed
  threshold: 1
  period: 60
  evaluation_periods: 2

ECS Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80

- name: HighMemory
  metric: MemoryUtilization
  threshold: 85

- name: RunningTaskCount
  metric: RunningTaskCount
  threshold: 1
  comparison: LessThan

RDS Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80

- name: LowFreeStorage
  metric: FreeStorageSpace
  threshold: 10737418240  # 10GB
  comparison: LessThan

- name: HighConnections
  metric: DatabaseConnections
  threshold: 100

Implementation

Create Alarm

aws cloudwatch put-metric-alarm \
  --alarm-name prod-ec2-high-cpu \
  --alarm-description "EC2 CPU > 80% for 15 minutes" \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --treat-missing-data notBreaching

Dashboard Template

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "EC2 CPU Utilization",
        "metrics": [
          ["AWS/EC2", "CPUUtilization", "InstanceId", "i-xxx"]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1"
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "ECS Service Memory",
        "metrics": [
          ["AWS/ECS", "MemoryUtilization", "ServiceName", "my-service"]
        ]
      }
    }
  ]
}

Custom Metrics

import boto3

cloudwatch = boto3.client('cloudwatch')

# Publish custom metric
cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'RequestLatency',
            'Dimensions': [
                {'Name': 'Service', 'Value': 'API'},
                {'Name': 'Environment', 'Value': 'prod'}
            ],
            'Value': 150.5,
            'Unit': 'Milliseconds'
        }
    ]
)

Log Insights Queries

Error Rate

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as error_count by bin(5m)

Latency Analysis

fields @timestamp, latency
| stats avg(latency) as avg_latency,
        pct(latency, 95) as p95_latency,
        pct(latency, 99) as p99_latency
  by bin(1h)

Top Errors

fields @timestamp, @message
| filter @message like /Exception|Error/
| parse @message /(?<error_type>\w+Exception)/
| stats count() as count by error_type
| sort count desc
| limit 10

Troubleshooting

Common Issues

Symptom Cause Solution
No data Metric not emitting Check CloudWatch Agent
Alarm stuck Insufficient data Check treat_missing_data
Dashboard empty Wrong namespace Verify metric source
High costs Too many metrics Use metric filters

Debug Checklist

  • CloudWatch Agent installed and running?
  • IAM role allows cloudwatch:PutMetricData?
  • Correct namespace and dimensions?
  • Metric has data in expected period?
  • Alarm threshold reasonable?
  • SNS topic has subscriptions?

Test Template

def test_cloudwatch_alarm():
    # Arrange
    alarm_name = "test-alarm"

    # Act
    cw.put_metric_alarm(
        AlarmName=alarm_name,
        MetricName='CPUUtilization',
        Namespace='AWS/EC2',
        Statistic='Average',
        Period=300,
        EvaluationPeriods=1,
        Threshold=80,
        ComparisonOperator='GreaterThanThreshold'
    )

    # Assert
    response = cw.describe_alarms(AlarmNames=[alarm_name])
    assert len(response['MetricAlarms']) == 1

    # Cleanup
    cw.delete_alarms(AlarmNames=[alarm_name])

Assets

  • assets/alarm-config.yaml - Common alarm configurations

References