| name | auto-rollback-triggers |
| description | Error rate monitoring, SLO detection, and notification webhooks for automated rollback triggers. Use when setting up automated deployment rollback, monitoring error rates, configuring SLO thresholds, implementing deployment safety nets, setting up alerting webhooks, or when user mentions automated rollback, error rate monitoring, SLO violations, deployment safety, or rollback automation. |
| allowed-tools | Bash, Read, Write, Edit |
Auto-Rollback Triggers
Automated rollback trigger patterns with error rate monitoring, SLO detection, and notification webhooks for deployment safety.
Overview
This skill provides functional monitoring scripts, CI/CD workflow templates, and webhook integration examples for automated deployment rollback triggers. All scripts include proper error handling, threshold configuration, and notification patterns for production safety nets.
Scripts
All scripts are located in scripts/ and are fully functional (not placeholders).
Core Monitoring Scripts
- monitor-error-rate.sh - Real-time error rate monitoring with configurable thresholds and time windows
- check-slo.sh - SLO (Service Level Objective) validation with success rate calculations
- trigger-rollback.sh - Automated rollback orchestration with platform-specific implementations
- collect-metrics.sh - Metrics collection from various sources (logs, APM, monitoring services)
- notify-webhook.sh - Webhook notification delivery with retry logic and templating
Usage Examples
# Monitor error rate (5% threshold over 5 minutes)
bash scripts/monitor-error-rate.sh https://api.example.com/metrics 5.0 300
# Check SLO compliance (99.9% uptime target)
bash scripts/check-slo.sh https://api.example.com/health 99.9
# Trigger rollback to previous version
bash scripts/trigger-rollback.sh vercel my-project previous-deployment-id
# Collect metrics from deployment
bash scripts/collect-metrics.sh https://api.example.com/metrics
# Send webhook notification
bash scripts/notify-webhook.sh "https://hooks.slack.com/services/your_webhook_url_here" "Deployment failed SLO check"
Templates
All templates are located in templates/ and provide configuration examples.
GitHub Actions Workflows
- github-actions-error-monitoring.yml - GitHub Actions workflow for continuous error rate monitoring
- github-actions-slo-check.yml - GitHub Actions workflow for SLO validation post-deployment
- github-actions-auto-rollback.yml - Complete auto-rollback workflow with monitoring and triggers
- gitlab-ci-auto-rollback.yml - GitLab CI/CD equivalent for auto-rollback patterns
Configuration Templates
- error-threshold-config.json - Error rate threshold configuration with time windows
- error-threshold-config.yaml - YAML version of error threshold configuration
- slo-config.json - SLO definition and validation rules
- webhook-config.json - Webhook endpoint configuration with retry policies
- rollback-policy.json - Rollback decision policy configuration
Platform-Specific Templates
- vercel-deployment-protection.json - Vercel deployment protection rules
- digitalocean-app-rollback.json - DigitalOcean App Platform rollback configuration
- railway-deployment-check.json - Railway deployment health check configuration
Template Usage
# Copy workflow to GitHub Actions
cp templates/github-actions-auto-rollback.yml .github/workflows/auto-rollback.yml
# Configure error thresholds
cp templates/error-threshold-config.json config/error-thresholds.json
# Set up SLO definitions
cp templates/slo-config.json config/slo.json
Examples
All examples are located in examples/ and demonstrate real-world usage patterns.
Example Files
- basic-error-monitoring.md - Simple error rate monitoring setup
- slo-based-rollback.md - SLO violation detection and automated rollback
- slack-webhook-integration.md - Slack notification webhook integration
- discord-webhook-integration.md - Discord notification webhook integration
- multi-platform-rollback.md - Multi-platform rollback orchestration (Vercel, DigitalOcean, Railway)
- advanced-monitoring.md - Advanced monitoring with APM integration (Datadog, New Relic, Sentry)
- gradual-rollout-protection.md - Canary deployment protection with auto-rollback
Instructions
Setting Up Error Rate Monitoring
Configure Error Thresholds
# Copy and customize threshold configuration cp templates/error-threshold-config.json config/error-thresholds.json # Edit thresholds for your application # Example: 5% error rate over 5 minutes triggers rollbackDeploy Monitoring Script
# Run monitoring in background or CI/CD pipeline bash scripts/monitor-error-rate.sh \ https://api.myapp.com/metrics \ 5.0 \ 300 \ config/error-thresholds.jsonIntegrate with GitHub Actions
# Copy workflow template cp templates/github-actions-error-monitoring.yml .github/workflows/monitor-errors.yml # Configure secrets: DEPLOYMENT_URL, WEBHOOK_URL, ROLLBACK_TOKEN
Setting Up SLO-Based Rollback
Define SLO Targets
# Copy SLO configuration template cp templates/slo-config.json config/slo.json # Define targets: 99.9% uptime, <500ms p95 latency, <1% error rateRun SLO Validation
# Check SLO compliance after deployment bash scripts/check-slo.sh \ https://api.myapp.com/health \ 99.9 \ config/slo.json # Exit code 0 = SLO met, 1 = SLO violated (trigger rollback)Automate with CI/CD
# Copy complete auto-rollback workflow cp templates/github-actions-auto-rollback.yml .github/workflows/auto-rollback.yml # Workflow automatically monitors and rolls back on SLO violations
Webhook Integration
Slack Webhook Integration
# Set webhook URL (use placeholder, replace with real URL)
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/your_webhook_url_here"
# Send notification
bash scripts/notify-webhook.sh \
"$SLACK_WEBHOOK_URL" \
"Deployment failed: Error rate 8.5% exceeds threshold 5.0%"
Discord Webhook Integration
# Set webhook URL (use placeholder, replace with real URL)
export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/your_webhook_url_here"
# Send notification with custom formatting
bash scripts/notify-webhook.sh \
"$DISCORD_WEBHOOK_URL" \
"Auto-rollback triggered" \
--discord
Platform-Specific Rollback
Vercel Rollback
# Trigger Vercel deployment rollback
bash scripts/trigger-rollback.sh vercel \
my-project \
previous-deployment-id \
"$VERCEL_TOKEN"
DigitalOcean App Platform Rollback
# Trigger DigitalOcean App rollback
bash scripts/trigger-rollback.sh digitalocean \
app-id \
previous-deployment-id \
"$DIGITALOCEAN_TOKEN"
Railway Rollback
# Trigger Railway deployment rollback
bash scripts/trigger-rollback.sh railway \
project-id \
previous-deployment-id \
"$RAILWAY_TOKEN"
Integration Patterns
GitHub Actions Integration
Continuous Monitoring Workflow
- Runs every 5 minutes during deployment window
- Monitors error rates and SLO metrics
- Automatically triggers rollback on threshold violations
- Sends notifications to Slack/Discord
Post-Deployment Validation
- Runs immediately after deployment
- Validates SLO compliance for 15 minutes
- Rolls back if SLO violations detected
- Reports results to deployment dashboard
Canary Deployment Protection
- Monitors canary deployment metrics
- Compares error rates: canary vs stable
- Automatically promotes or rolls back
- Gradual traffic shift with safety checks
GitLab CI/CD Integration
Similar patterns available for GitLab CI/CD using templates/gitlab-ci-auto-rollback.yml
Platform-Specific Integration
Vercel Deployment Protection
- Use
templates/vercel-deployment-protection.jsonfor native Vercel checks - Configure automated checks in Vercel dashboard
- Integrate with GitHub Actions for advanced monitoring
DigitalOcean App Platform
- Use
templates/digitalocean-app-rollback.jsonfor health checks - Configure App Platform health checks
- Use doctl CLI for automated rollback
Railway
- Use
templates/railway-deployment-check.jsonfor health checks - Configure Railway health check endpoints
- Use Railway CLI for automated rollback
Requirements
Core Dependencies
curl- For HTTP requests to metrics endpointsjq- For JSON parsing and metrics extractionbc- For threshold calculationsdate- For time window calculations (GNU coreutils)
Optional Dependencies
Platform CLIs:
vercel- Vercel CLI for deployment managementdoctl- DigitalOcean CLI for App Platform managementrailway- Railway CLI for project management
Monitoring Tools:
datadog-cli- Datadog metrics collectionnewrelic-cli- New Relic APM integrationsentry-cli- Sentry error tracking integration
GitHub Actions Secrets
Configure these secrets in your GitHub repository:
DEPLOYMENT_URL- Application metrics endpointWEBHOOK_URL- Slack/Discord webhook URL (use placeholder:https://hooks.example.com/your_webhook_url_here)ROLLBACK_TOKEN- Platform API token for rollback operationsVERCEL_TOKEN- Vercel API token (if using Vercel)DIGITALOCEAN_TOKEN- DigitalOcean API token (if using DigitalOcean)RAILWAY_TOKEN- Railway API token (if using Railway)
Exit Codes
All scripts follow standard exit code conventions:
0- Metrics within thresholds, SLO met, rollback successful1- Threshold exceeded, SLO violated, rollback required2- Invalid arguments or missing dependencies3- Timeout or network error accessing metrics4- Platform API error during rollback5- Webhook notification failed
Best Practices
- Start with Conservative Thresholds - Set thresholds that catch real issues without false positives
- Use Time Windows - Monitor over time windows (5-15 minutes) to avoid reacting to transient spikes
- Test in Staging First - Validate rollback triggers in staging environment before production
- Implement Gradual Rollout - Use canary deployments with automated protection
- Monitor Rollback Success - Verify rollback actually resolves the issue
- Alert Human Teams - Always notify teams when auto-rollback triggers
- Document Thresholds - Clearly document why specific thresholds were chosen
- Review Rollback History - Regularly review triggered rollbacks to improve thresholds
- Use Placeholder Webhooks - Never commit real webhook URLs, use placeholders
- Secure API Tokens - Store platform tokens in CI/CD secrets, never in code
Security Considerations
- Webhook URLs - Always use placeholders like
https://hooks.example.com/your_webhook_url_herein templates - API Tokens - Store in CI/CD secrets or environment variables, never hardcode
- Metrics Endpoints - Ensure metrics endpoints are authenticated and secured
- Rollback Permissions - Limit rollback permissions to CI/CD service accounts only
- Audit Logging - Log all rollback triggers and actions for audit trail
Troubleshooting
False Positive Rollbacks
- Increase time window for error rate monitoring
- Adjust thresholds based on normal application behavior
- Filter out expected errors (e.g., 404s from bots)
Missed Rollback Triggers
- Decrease monitoring interval (e.g., 1 minute instead of 5)
- Lower thresholds if issues aren't caught
- Add multiple SLO metrics (error rate, latency, availability)
Webhook Notifications Not Delivered
- Verify webhook URL is correct (not placeholder)
- Check webhook service status
- Implement retry logic with exponential backoff
- Add fallback notification channels
Platform Rollback Failures
- Verify API token permissions
- Check platform API rate limits
- Ensure previous deployment ID is valid
- Implement manual rollback fallback
Location: /home/gotime2022/.claude/plugins/marketplaces/dev-lifecycle-marketplace/plugins/deployment/skills/auto-rollback-triggers/