| name | database-migration |
| description | Safe database migration procedures with backward compatibility, backups, and rollback strategies. Use when creating, modifying, or dropping database schemas. Covers migration creation, testing, execution, and rollback. |
| version | 1.0.0 |
| author | Database Team |
| category | custom |
| token_estimate | ~3000 |
- Adding/modifying/removing database tables or columns
- Creating or dropping indexes
- Changing constraints or relationships
- Migrating data between schemas
- Altering database configurations
Do NOT use this skill when:
- Making simple data updates (use database query skills)
- One-time data fixes (use admin scripts)
- Schema-less database changes (document/key-value stores)
- Migration tool installed (Alembic, Flyway, Django migrations, etc.)
- Database backup capability available
- Staging environment that mirrors production
- Database admin access for production
- Rollback window defined
Using Alembic (Python):
# Generate migration
alembic revision --autogenerate -m "add_user_preferences_table"
# Review generated migration
cat alembic/versions/abc123_add_user_preferences_table.py
Migration Structure:
"""add user preferences table
Revision ID: abc123
Revises: xyz789
Create Date: 2025-01-20 10:30:00
"""
from alembic import op
import sqlalchemy as sa
revision = 'abc123'
down_revision = 'xyz789'
def upgrade():
"""Apply migration."""
# Create table
op.create_table(
'user_preferences',
sa.Column('id', sa.Integer(), primary_key=True),
sa.Column('user_id', sa.Integer(), sa.ForeignKey('users.id'), nullable=False),
sa.Column('theme', sa.String(20), default='light'),
sa.Column('notifications_enabled', sa.Boolean(), default=True),
sa.Column('created_at', sa.DateTime(), server_default=sa.func.now()),
sa.Column('updated_at', sa.DateTime(), onupdate=sa.func.now())
)
# Create indexes for common queries
op.create_index('idx_user_preferences_user_id', 'user_preferences', ['user_id'])
def downgrade():
"""Rollback migration."""
op.drop_index('idx_user_preferences_user_id')
op.drop_table('user_preferences')
Key Requirements:
- Both
upgrade()anddowngrade()must be defined - Descriptive migration message
- Include indexes for performance
- Foreign keys properly defined
Backward Compatible Patterns:
Adding Column (nullable or with default):
# Good: Nullable column (old code won't break)
op.add_column('users', sa.Column('bio', sa.Text(), nullable=True))
# Good: Column with default (old code gets default value)
op.add_column('users', sa.Column('status', sa.String(20), server_default='active'))
# Bad: Required column without default (breaks old code)
op.add_column('users', sa.Column('required_field', sa.String(50), nullable=False))
Removing Column (two-step process):
# Step 1 (Deploy first): Make column nullable, update code to not use it
op.alter_column('users', 'old_column', nullable=True)
# Step 2 (Deploy later): Remove column after old code is gone
op.drop_column('users', 'old_column')
Renaming Column (multi-step process):
# Step 1: Add new column
op.add_column('users', sa.Column('new_name', sa.String(100)))
# Step 2: Backfill data
op.execute('UPDATE users SET new_name = old_name WHERE new_name IS NULL')
# Step 3 (separate migration): Drop old column after code updated
op.drop_column('users', 'old_name')
Changing Column Type (multi-step):
# Step 1: Add new column with new type
op.add_column('users', sa.Column('age_int', sa.Integer()))
# Step 2: Migrate data
op.execute('UPDATE users SET age_int = CAST(age_string AS INTEGER)')
# Step 3 (separate migration): Remove old column
op.drop_column('users', 'age_string')
Create Staging Database Snapshot:
# Restore production snapshot to staging (sanitize PII first)
pg_dump production_db | pg_restore -d staging_db
Apply Migration:
# Run migration in staging
alembic upgrade head
# Check migration applied
alembic current
# Verify database state
psql staging_db -c "\d+ user_preferences"
Test Application Compatibility:
# Deploy code to staging
kubectl set image deployment/app app=myapp:new-version -n staging
# Run smoke tests
pytest tests/integration/ --env=staging
# Verify key workflows work
curl https://staging.api.com/health
Test Rollback:
# Rollback migration
alembic downgrade -1
# Verify rollback succeeded
alembic current
psql staging_db -c "\d+ user_preferences" # Should not exist
# Re-apply for production
alembic upgrade head
Staging Checklist:
- Migration applies successfully
- Application works with new schema
- Application works with old schema (before migration)
- Rollback works correctly
- Performance is acceptable (check query times)
- Indexes are effective (check query plans)
PostgreSQL Backup:
# Full backup
pg_dump -h production-host -U dbuser -d production_db \
-F c -f backup_before_migration_$(date +%Y%m%d_%H%M%S).dump
# Verify backup
pg_restore --list backup_before_migration_20250120_103000.dump | head
# Store securely
aws s3 cp backup_before_migration_20250120_103000.dump \
s3://backups/migrations/
MySQL Backup:
# Full backup
mysqldump -h production-host -u dbuser -p production_db \
> backup_before_migration_$(date +%Y%m%d_%H%M%S).sql
# Verify
head -n 50 backup_before_migration_20250120_103000.sql
Document Backup:
# Record backup details
echo "Migration: add_user_preferences_table" > migration_backup_info.txt
echo "Backup file: backup_before_migration_20250120_103000.dump" >> migration_backup_info.txt
echo "Database: production_db" >> migration_backup_info.txt
echo "Time: $(date)" >> migration_backup_info.txt
echo "Size: $(du -h backup_before_migration_20250120_103000.dump)" >> migration_backup_info.txt
Pre-Migration Checks:
# Verify backup exists
ls -lh backup_before_migration_*.dump
# Check database connection
psql -h production-host -U dbuser -d production_db -c "SELECT version();"
# Review migration to apply
alembic upgrade head --sql > migration_preview.sql
cat migration_preview.sql # Review SQL before applying
# Check current database version
alembic current
Apply Migration:
# Connect to production
export DATABASE_URL="postgresql://user:pass@prod-host/prod_db"
# Apply migration
alembic upgrade head
# Verify applied
alembic current
# Check new table exists
psql $DATABASE_URL -c "\d+ user_preferences"
Monitor Application:
# Watch application logs for errors
kubectl logs -n production -l app=myapp --follow | grep -i error
# Check error rate in monitoring
# - Should remain stable
# - No spike in database errors
# Verify key endpoints
curl https://api.example.com/health
curl https://api.example.com/api/v1/users/me
Validation:
# Run smoke tests against production
pytest tests/smoke/ --env=production
# Check database metrics
# - Query performance
# - Connection count
# - Lock waits
# Verify indexes being used
psql $DATABASE_URL -c "EXPLAIN ANALYZE SELECT * FROM user_preferences WHERE user_id = 123;"
When to Rollback:
- Application errors increase significantly
- Migration takes longer than maintenance window
- Data corruption detected
- Performance severely degraded
Rollback Steps:
# 1. Stop application traffic (if severe)
kubectl scale deployment/app --replicas=0 -n production
# 2. Rollback migration
alembic downgrade -1
# 3. Verify rollback
alembic current
psql $DATABASE_URL -c "\d+ user_preferences" # Should not exist
# 4. Restore application
kubectl scale deployment/app --replicas=5 -n production
# 5. Monitor for recovery
# Check error rates return to normal
# Verify application functionality
If Rollback Fails:
# Restore from backup (last resort)
pg_restore -h production-host -U dbuser -d production_db \
--clean --if-exists \
backup_before_migration_20250120_103000.dump
# Verify restore
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users;"
# Compare with expected count
# Restart application
kubectl rollout restart deployment/app -n production
How to Avoid:
- Add columns as nullable or with defaults
- Use multi-step migrations for breaking changes
- Test with old and new code versions
How to Avoid:
- Use production snapshot for staging
- Test with realistic data volumes
- Verify migration and rollback both work
How to Avoid:
- Always backup before migration
- Verify backup is restorable
- Store backup securely
- Test restore process periodically
Migration:
def upgrade():
# Create index concurrently (doesn't lock table)
op.create_index(
'idx_users_email',
'users',
['email'],
unique=True,
postgresql_concurrently=True
)
def downgrade():
op.drop_index('idx_users_email', 'users', postgresql_concurrently=True)
Testing in Staging:
# Apply migration
alembic upgrade head
# Test query performance
psql staging_db -c "EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';"
# Should show index scan, not seq scan
# Check index is used
psql staging_db -c "SELECT * FROM pg_stat_user_indexes WHERE indexrelname = 'idx_users_email';"
Production Application:
# Backup
pg_dump production_db -F c -f backup_before_index.dump
# Apply
alembic upgrade head
# Takes ~2 minutes for 1M rows with CONCURRENT
# Verify
psql production_db -c "\d+ users"
# Should show new index
# Monitor query performance improvement
# Check monitoring dashboards for query time reduction
Outcome: Index created without downtime. Query performance improved from 2s to 50ms.
Step 1 Migration: Add New Column
def upgrade():
# Add new column with same data
op.add_column('users', sa.Column('display_name', sa.String(100)))
# Copy data
op.execute('UPDATE users SET display_name = username WHERE display_name IS NULL')
# Make non-nullable after data copied
op.alter_column('users', 'display_name', nullable=False)
def downgrade():
op.drop_column('users', 'display_name')
Step 1 Code Update:
# Update code to write to both columns
class User(db.Model):
username = db.Column(db.String(100)) # Old, will remove later
display_name = db.Column(db.String(100)) # New
def set_name(self, name):
self.username = name # Still write to old
self.display_name = name # Also write to new
Deploy Step 1:
# Apply migration
alembic upgrade head
# Deploy code that writes to both columns
kubectl set image deployment/app app=myapp:step1 -n production
# Monitor: No errors, both columns being written
Step 2 Migration: Drop Old Column (weeks later)
def upgrade():
# Drop old column (after code fully migrated)
op.drop_column('users', 'username')
def downgrade():
# Can't really undo this without data loss
# Would need to add column and copy from display_name
op.add_column('users', sa.Column('username', sa.String(100)))
op.execute('UPDATE users SET username = display_name')
Step 2 Code Update:
# Remove old column from code
class User(db.Model):
display_name = db.Column(db.String(100)) # Only new column
Deploy Step 2:
# Deploy code that only uses display_name
kubectl set image deployment/app app=myapp:step2 -n production
# Wait a few days, verify stability
# Apply migration to drop old column
alembic upgrade head
Outcome: Zero-downtime rename completed over two deployments. Old and new code worked throughout.