| name | alerting |
| description | Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail. |
Alerting Skill
This skill provides comprehensive monitoring and alerting capabilities for the Univers infrastructure ecosystem.
Capabilities
1. Real-time Monitoring
- System resource monitoring (CPU, Memory, Disk, Network)
- Service health checks (HTTP endpoints, ports, processes)
- Application-specific metrics (response times, error rates)
- Custom metric collection and aggregation
2. Alert Engine
- Threshold-based alerting
- Rate limiting and alert suppression
- Alert escalation policies
- Multi-condition alert rules
3. Notification Channels
- Email notifications with rich formatting
- Slack/Teams integration with actionable messages
- Webhook support for custom integrations
- In-app notifications and banners
4. Alert Management
- Alert acknowledgment and resolution
- Alert history and analytics
- Scheduled maintenance windows
- Alert rule testing and validation
5. Dashboards and Reports
- Real-time alert status dashboard
- Historical alert trends and analytics
- Service health overview
- Performance metrics visualization
Common Tasks
Basic Alert Setup
# Check system for alert conditions
alert check system
# Monitor specific services
alert monitor services
# Test notification channels
alert test channels
Alert Rule Management
# List all alert rules
alert rules list
# Add new alert rule
alert rules add cpu-high --threshold 80 --duration 5m
# Update existing rule
alert rules update memory-usage --threshold 90
# Remove alert rule
alert rules remove disk-space-low
Notification Configuration
# Configure email notifications
alert config email --smtp smtp.example.com --from alerts@example.com
# Configure Slack integration
alert config slack --webhook https://hooks.slack.com/... --channel #alerts
# Test notification delivery
alert test email --to admin@example.com
alert test slack --message "Test alert"
Alert Operations
# View active alerts
alert status
# Acknowledge an alert
alert acknowledge CPU_HIGH_001
# Resolve an alert
alert resolve MEMORY_HIGH_003
# View alert history
alert history --last 24h
Alert Rule Examples
System Resource Alerts
# High CPU Usage
name: cpu-high
condition: cpu_usage > 80
duration: 5m
severity: warning
message: "CPU usage is {{cpu_usage}}% on {{hostname}}"
actions:
- type: email
to: ops@example.com
- type: slack
channel: #alerts
# Critical Memory Usage
name: memory-critical
condition: memory_usage > 90
duration: 2m
severity: critical
message: "Critical memory usage: {{memory_usage}}%"
actions:
- type: webhook
url: https://api.pagerduty.com/incidents
Service Health Alerts
# Service Down
name: service-down
condition: service_health == 0
duration: 1m
severity: critical
message: "{{service_name}} is down on {{hostname}}"
actions:
- type: email
to: devops@example.com
- type: restart
service: "{{service_name}}"
# High Response Time
name: slow-response
condition: response_time > 2000
duration: 3m
severity: warning
message: "{{service_name}} response time: {{response_time}}ms"
actions:
- type: slack
channel: #performance
Application-Specific Alerts
# High Error Rate
name: high-error-rate
condition: error_rate > 5
duration: 5m
severity: warning
message: "{{application}} error rate: {{error_rate}}%"
actions:
- type: email
to: dev-team@example.com
# Database Connection Issues
name: db-connection-failed
condition: db_connection_status != "healthy"
duration: 30s
severity: critical
message: "Database connection failed for {{application}}"
actions:
- type: webhook
url: https://hooks.slack.com/...
Integration Examples
Univers Services Integration
# Monitor Univers services
alert monitor univers-services
# Check specific Univers endpoints
alert check endpoint http://localhost:3003/health --service univers-server
alert check endpoint http://localhost:6007 --service univers-ui
alert check endpoint http://localhost:5173 --service univers-web
# Monitor tmux sessions
alert monitor tmux-sessions --alert-if-missing univers-developer
Container Integration
# Monitor Docker containers
alert monitor containers --include univers-*
# Check container health
alert check container univers-server
alert check container univers-ui
Configuration Files
Alert Rules Configuration
# ~/.config/univers/alerting/rules.yaml
rules:
- name: system-cpu-high
type: system
metric: cpu_usage
operator: ">"
threshold: 80
duration: 5m
severity: warning
- name: service-unavailable
type: service
check: http_status
target: "http://localhost:3003/health"
operator: "!="
threshold: 200
duration: 1m
severity: critical
Notification Channels
# ~/.config/univers/alerting/channels.yaml
channels:
email:
smtp_host: smtp.gmail.com
smtp_port: 587
username: alerts@company.com
password: ${SMTP_PASSWORD}
slack:
webhook_url: ${SLACK_WEBHOOK_URL}
default_channel: #univers-alerts
webhook:
endpoint: https://api.example.com/alerts
headers:
Authorization: "Bearer ${API_TOKEN}"
Best Practices
- Set Meaningful Thresholds: Avoid alert fatigue by setting realistic thresholds
- Use Escalation Policies: Implement graduated alert escalation
- Provide Context: Include relevant details in alert messages
- Test Regularly: Verify alert rules and notification channels
- Document Procedures: Maintain clear runbooks for common alerts
Troubleshooting
Common Issues
- Missing Notifications: Check channel configurations and connectivity
- False Positives: Review alert thresholds and conditions
- Alert Storms: Implement rate limiting and suppression rules
- Slow Performance: Optimize alert check intervals and data collection
Debug Commands
# Check alert engine status
alert status --verbose
# Test specific rule
alert test-rule cpu-high
# Check notification delivery
alert test-notification email --to test@example.com
# View alert engine logs
alert logs --tail 100
Version History
- v1.0 (2025-12-16): Initial alerting system implementation
- Basic monitoring, email notifications, and alert rules