Claude Code Plugins

Community-maintained marketplace

Feedback

Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name alerting
description Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail.

Alerting Skill

This skill provides comprehensive monitoring and alerting capabilities for the Univers infrastructure ecosystem.

Capabilities

1. Real-time Monitoring

  • System resource monitoring (CPU, Memory, Disk, Network)
  • Service health checks (HTTP endpoints, ports, processes)
  • Application-specific metrics (response times, error rates)
  • Custom metric collection and aggregation

2. Alert Engine

  • Threshold-based alerting
  • Rate limiting and alert suppression
  • Alert escalation policies
  • Multi-condition alert rules

3. Notification Channels

  • Email notifications with rich formatting
  • Slack/Teams integration with actionable messages
  • Webhook support for custom integrations
  • In-app notifications and banners

4. Alert Management

  • Alert acknowledgment and resolution
  • Alert history and analytics
  • Scheduled maintenance windows
  • Alert rule testing and validation

5. Dashboards and Reports

  • Real-time alert status dashboard
  • Historical alert trends and analytics
  • Service health overview
  • Performance metrics visualization

Common Tasks

Basic Alert Setup

# Check system for alert conditions
alert check system

# Monitor specific services
alert monitor services

# Test notification channels
alert test channels

Alert Rule Management

# List all alert rules
alert rules list

# Add new alert rule
alert rules add cpu-high --threshold 80 --duration 5m

# Update existing rule
alert rules update memory-usage --threshold 90

# Remove alert rule
alert rules remove disk-space-low

Notification Configuration

# Configure email notifications
alert config email --smtp smtp.example.com --from alerts@example.com

# Configure Slack integration
alert config slack --webhook https://hooks.slack.com/... --channel #alerts

# Test notification delivery
alert test email --to admin@example.com
alert test slack --message "Test alert"

Alert Operations

# View active alerts
alert status

# Acknowledge an alert
alert acknowledge CPU_HIGH_001

# Resolve an alert
alert resolve MEMORY_HIGH_003

# View alert history
alert history --last 24h

Alert Rule Examples

System Resource Alerts

# High CPU Usage
name: cpu-high
condition: cpu_usage > 80
duration: 5m
severity: warning
message: "CPU usage is {{cpu_usage}}% on {{hostname}}"
actions:
  - type: email
    to: ops@example.com
  - type: slack
    channel: #alerts

# Critical Memory Usage
name: memory-critical
condition: memory_usage > 90
duration: 2m
severity: critical
message: "Critical memory usage: {{memory_usage}}%"
actions:
  - type: webhook
    url: https://api.pagerduty.com/incidents

Service Health Alerts

# Service Down
name: service-down
condition: service_health == 0
duration: 1m
severity: critical
message: "{{service_name}} is down on {{hostname}}"
actions:
  - type: email
    to: devops@example.com
  - type: restart
    service: "{{service_name}}"

# High Response Time
name: slow-response
condition: response_time > 2000
duration: 3m
severity: warning
message: "{{service_name}} response time: {{response_time}}ms"
actions:
  - type: slack
    channel: #performance

Application-Specific Alerts

# High Error Rate
name: high-error-rate
condition: error_rate > 5
duration: 5m
severity: warning
message: "{{application}} error rate: {{error_rate}}%"
actions:
  - type: email
    to: dev-team@example.com

# Database Connection Issues
name: db-connection-failed
condition: db_connection_status != "healthy"
duration: 30s
severity: critical
message: "Database connection failed for {{application}}"
actions:
  - type: webhook
    url: https://hooks.slack.com/...

Integration Examples

Univers Services Integration

# Monitor Univers services
alert monitor univers-services

# Check specific Univers endpoints
alert check endpoint http://localhost:3003/health --service univers-server
alert check endpoint http://localhost:6007 --service univers-ui
alert check endpoint http://localhost:5173 --service univers-web

# Monitor tmux sessions
alert monitor tmux-sessions --alert-if-missing univers-developer

Container Integration

# Monitor Docker containers
alert monitor containers --include univers-*

# Check container health
alert check container univers-server
alert check container univers-ui

Configuration Files

Alert Rules Configuration

# ~/.config/univers/alerting/rules.yaml
rules:
  - name: system-cpu-high
    type: system
    metric: cpu_usage
    operator: ">"
    threshold: 80
    duration: 5m
    severity: warning

  - name: service-unavailable
    type: service
    check: http_status
    target: "http://localhost:3003/health"
    operator: "!="
    threshold: 200
    duration: 1m
    severity: critical

Notification Channels

# ~/.config/univers/alerting/channels.yaml
channels:
  email:
    smtp_host: smtp.gmail.com
    smtp_port: 587
    username: alerts@company.com
    password: ${SMTP_PASSWORD}

  slack:
    webhook_url: ${SLACK_WEBHOOK_URL}
    default_channel: #univers-alerts

  webhook:
    endpoint: https://api.example.com/alerts
    headers:
      Authorization: "Bearer ${API_TOKEN}"

Best Practices

  1. Set Meaningful Thresholds: Avoid alert fatigue by setting realistic thresholds
  2. Use Escalation Policies: Implement graduated alert escalation
  3. Provide Context: Include relevant details in alert messages
  4. Test Regularly: Verify alert rules and notification channels
  5. Document Procedures: Maintain clear runbooks for common alerts

Troubleshooting

Common Issues

  • Missing Notifications: Check channel configurations and connectivity
  • False Positives: Review alert thresholds and conditions
  • Alert Storms: Implement rate limiting and suppression rules
  • Slow Performance: Optimize alert check intervals and data collection

Debug Commands

# Check alert engine status
alert status --verbose

# Test specific rule
alert test-rule cpu-high

# Check notification delivery
alert test-notification email --to test@example.com

# View alert engine logs
alert logs --tail 100

Version History

  • v1.0 (2025-12-16): Initial alerting system implementation
  • Basic monitoring, email notifications, and alert rules