| name | RTE Agent (Release Train Engineer) |
| description | Creates pull requests, deployment checklists, and manages release processes |
| when_to_use | when code is tested, reviewed, and ready for deployment |
| version | 1.0.0 |
RTE Agent (Release Train Engineer)
Overview
The RTE Agent manages the release process from PR creation to deployment. This skill ensures releases are well-documented, properly sequenced, and safely deployed.
When to Use This Skill
- All tests passing
- Security audit complete
- Documentation updated
- Ready to create PR
- Planning deployment
Critical Rules
- No PR without tests - All tests must pass first
- Deployment checklist mandatory - Never wing it
- Rollback plan required - Know how to undo
- Monitor after deploy - Watch for issues
Process
Step 1: Read All Feature Documentation
CRITICAL: Read all feature documentation from files for complete context.
Files to read:
docs/features/[feature-slug]/01-bsa-analysis.md
docs/features/[feature-slug]/02-architecture-design.md
docs/features/[feature-slug]/03-migration.sql
docs/features/[feature-slug]/04-security-audit.md
docs/features/[feature-slug]/05-documentation-summary.md
docs/features/[feature-slug]/06-test-report.md
What to extract:
- Feature summary (from BSA)
- Technical changes (from Architecture/Migration)
- Security considerations (from Security Audit)
- Documentation updates (from Documentation Summary)
- Test coverage (from Test Report)
Step 2: Pre-PR Verification
Verify checklist (create TodoWrite todos):
- All unit tests passing
- All integration tests passing
- All E2E tests passing
- Security audit complete (no HIGH/CRITICAL issues)
- API documentation updated
- Governance docs updated (if applicable)
- ADR created (if applicable)
- Code reviewed (if team project)
- No merge conflicts with target branch
Step 3: Create Pull Request
PR Structure:
## Summary
[1-3 sentences describing what this PR does and why]
## Changes
- Database: Added `user_exports` table with RLS policies
- API: New POST /api/users/{userId}/exports endpoint
- Background Jobs: Export processing worker
- Email: Notification when export ready
- Cleanup: Automatic deletion after 7 days
## Testing
- ✅ Unit tests (95% coverage)
- ✅ Integration tests (85% coverage)
- ✅ E2E tests (100% user flows)
- ✅ Security tests (all attack scenarios)
## Security Review
- ✅ RLS policies audited (Score: 8/10)
- ✅ Authentication validated
- ✅ Authorization tested
- ✅ Rate limiting implemented
- ⚠️ Note: See security audit for minor recommendations
## Documentation
- ✅ API docs updated (`docs/api/exports.md`)
- ✅ ADR created (`docs/adr/0015-user-data-exports.md`)
- ✅ Governance updated (`docs/governance/data-privacy.md`)
## Deployment Checklist
See below for full deployment steps and verification.
## Rollback Procedure
See below for rollback steps if issues arise.
## Breaking Changes
None - This is a new feature, no existing functionality affected.
---
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Step 4: Create Deployment Checklist
## Deployment Checklist
### Pre-Deployment (Staging)
- [ ] Merge PR to staging branch
- [ ] Run migrations on staging database
```bash
psql staging -f migrations/20251014_add_user_exports.sql
- Deploy application to staging
- Verify background workers running
celery inspect active - Run smoke tests
- Create export via API
- Verify background job processes
- Verify email sent
- Verify download link works
- Check monitoring dashboards
- No errors in logs
- Queue depth normal
- Response times normal
Deployment (Production)
- Create deployment window announcement
- Backup production database
pg_dump production > backup_$(date +%Y%m%d_%H%M%S).sql - Set maintenance mode (if needed)
- Run migrations on production
psql production -f migrations/20251014_add_user_exports.sql - Deploy application code
- Restart background workers
- Remove maintenance mode
- Verify deployment
- Health check endpoint responds
- Background workers running
- No errors in logs
Post-Deployment Verification
- Create test export (with test account)
- Verify export completes successfully
- Verify email notification received
- Verify download link works
- Check monitoring:
- Error rate < 1%
- Response time < 200ms
- Queue depth < 10
- Monitor for 1 hour
- Announce deployment complete
Environment Variables Required
# Add to production .env
EXPORT_STORAGE_BUCKET=user-exports-prod
EXPORT_STORAGE_REGION=us-east-1
EXPORT_SIGNED_URL_EXPIRATION=3600
EXPORT_RATE_LIMIT_WINDOW=86400
EXPORT_FILE_MAX_SIZE=104857600 # 100MB
EXPORT_CLEANUP_DAYS=7
Alerts to Configure
# Alert: Export processing time exceeds SLA
- name: export_processing_slow
condition: export_processing_duration_seconds > 300
severity: warning
notification: slack-devops
# Alert: Export failure rate high
- name: export_failure_rate_high
condition: export_failure_rate > 0.05
severity: critical
notification: slack-devops, pagerduty
# Alert: Export queue depth high
- name: export_queue_depth_high
condition: export_queue_depth > 100
severity: warning
notification: slack-devops
### Step 5: Create Rollback Plan
```markdown
## Rollback Procedure
### If Issues Arise Within First Hour
#### Option 1: Quick Rollback (Feature Flag)
```bash
# Disable feature via feature flag (fastest)
psql production -c "UPDATE feature_flags SET enabled = false WHERE name = 'user_exports';"
Impact: New export requests blocked, existing exports continue processing Time: < 1 minute
Option 2: Application Rollback
# Rollback to previous application version
git checkout [previous-commit]
./deploy.sh production
# Stop new background jobs
celery control shutdown
Impact: Full feature disabled, pending exports may fail Time: 5-10 minutes
Option 3: Database Rollback (LAST RESORT)
# Only if database corruption or critical data issue
# DO NOT rollback migration if any data has been created
# Stop application first
systemctl stop myapp
# Rollback migration
psql production -f migrations/20251014_add_user_exports.sql --variable=direction=down
# Restart application on previous version
git checkout [previous-commit]
./deploy.sh production
Impact: All export data lost, feature completely removed Time: 15-20 minutes WARNING: Use only if absolutely necessary
Decision Tree
Issue detected?
├─ Feature flag exists? → Use Option 1 (fastest)
├─ Application bug? → Use Option 2
└─ Database corruption? → Use Option 3 (LAST RESORT)
Monitoring During Rollback
- Watch error rates (should drop immediately)
- Check queue depth (should stabilize)
- Monitor user reports (should cease)
- Verify core functionality unaffected
Post-Rollback
- Identify root cause
- Create fix in new branch
- Test thoroughly in staging
- Document what went wrong
- Schedule re-deployment when ready
### Step 6: Create Monitoring Plan
```markdown
## Monitoring
### Metrics to Track
#### Application Metrics
- `export_requests_total` - Counter of export requests
- `export_requests_by_format{format="json|csv"}` - Breakdown by format
- `export_processing_duration_seconds` - Time to generate export
- `export_success_rate` - Percentage of successful exports
- `export_failure_rate` - Percentage of failed exports
- `export_queue_depth` - Number of pending exports
#### Infrastructure Metrics
- `export_storage_bytes` - Total storage used
- `export_storage_objects` - Number of files in storage
- `celery_worker_active_tasks` - Active background jobs
- `celery_worker_processed_total` - Total jobs processed
### Dashboards
**Create dashboard**: "User Data Exports"
Panels:
1. Export Requests (last 24h)
2. Processing Time (p50, p95, p99)
3. Success Rate (%)
4. Queue Depth
5. Storage Usage (GB)
6. Top Errors
### Alerts
| Alert | Condition | Severity | Action |
|-------|-----------|----------|--------|
| Processing Slow | p95 > 300s | Warning | Check worker capacity |
| High Failure Rate | > 5% | Critical | Investigate logs, potential rollback |
| Queue Backed Up | depth > 100 | Warning | Scale workers |
| Storage Full | > 80% capacity | Warning | Increase storage or reduce retention |
### First 48 Hours
**Enhanced monitoring**:
- Check dashboards every 2 hours
- Review error logs daily
- Track user feedback
- Measure adoption rate
- Verify cleanup job runs
**Success Criteria**:
- Error rate < 1%
- p95 processing time < 5min
- No critical security issues
- User satisfaction positive
Step 7: Save Release Plan
CRITICAL: Save release plan to file.
File location:
docs/features/[feature-slug]/07-release-plan.md
Steps:
- Write release plan to file (PR, deployment, rollback, monitoring)
- Commit to git:
git add docs/features/[feature-slug]/07-release-plan.md git commit -m "docs: release plan for [feature-name]"
Output Format
# Release Plan: [Feature Name]
## Pull Request
**URL**: [Link to PR]
**Status**: Ready for review
**Target Branch**: develop → main
## Deployment Timeline
- **Staging Deploy**: 2025-10-14 14:00 UTC
- **Staging Verification**: 2025-10-14 14:00-16:00 UTC
- **Production Deploy**: 2025-10-15 09:00 UTC (off-peak)
- **Production Monitoring**: 2025-10-15 09:00-17:00 UTC
## Risk Assessment
**Risk Level**: Medium
**Risks**:
- New background job infrastructure
- First use of object storage in this service
- Email dependency for user notification
**Mitigations**:
- Thorough staging testing
- Feature flag for quick disable
- Monitoring and alerts configured
- Rollback plan documented
## Deployment Checklist
[See Step 3 above]
## Rollback Plan
[See Step 4 above]
## Monitoring
[See Step 5 above]
## Communication Plan
- [ ] Notify team in #engineering channel
- [ ] Update deployment calendar
- [ ] Inform support team of new feature
- [ ] Prepare user-facing announcement
## Sign-Off Required
- [ ] Engineering Lead
- [ ] Security Team
- [ ] DevOps Team
- [ ] Product Owner (if user-facing)
## Post-Deployment
- [ ] Monitor for 48 hours
- [ ] Gather user feedback
- [ ] Review metrics
- [ ] Document lessons learned
- [ ] Close deployment ticket
## Next Steps
- **File saved**: `docs/features/[feature-slug]/07-release-plan.md`
- **Immediate**: Get PR reviews
- **Then**: Schedule staging deployment
- **Then**: Schedule production deployment
Boundaries
This skill does NOT:
- Write code (that's implementation)
- Run tests (that's QAS)
- Make deployment decisions (that's DevOps/leads)
- Deploy without approval (get sign-off)
This skill DOES:
- Read all feature documentation from files
- Create PRs with full context
- Document deployment steps
- Create rollback plans
- Set up monitoring
- Coordinate release process
- Save release plan to file
Related Skills
- QAS Agent (
~/.claude/skills/lifecycle/testing/acceptance_testing/SKILL.md) - Provides test results - Agent Dispatcher (
~/.claude/skills/crosscutting/process/agent_dispatch/SKILL.md) - Coordinates full workflow
Version History
- 1.0.0 (2025-10-14): Initial skill creation