docs/published/handbook/engineering/safe-django-migrations.md
This guide explains how to safely perform dangerous Django migration operations in production with PostgreSQL. Each section covers a risky operation and provides step-by-step instructions for the safe approach.
Rule of thumb: Never run a migration that drops, renames, or removes anything while any running code could still reference it. Use a two-phase approach: remove references → wait → drop.
Context: These guidelines are written for zero-downtime, rolling-deploy environments like PostHog's production setup. In single-instance or development setups, you can take shortcuts at your own risk — but these patterns prevent downtime in production.
Problem: DeleteModel operations drop tables immediately. This breaks backwards compatibility during deployment and cannot be rolled back - once data is deleted, any rollback deployment will fail because the table no longer exists.
Deploy table drops in separate phases with safety delays:
Step 1: Remove model and all references (single PR)
In one PR, remove the model and all code that references it:
Remove all application code that uses the model:
Delete the model class from models.py
Run makemigrations - Django will generate a DeleteModel operation
Wrap the generated migration in SeparateDatabaseAndState to only affect Django's state, not the database:
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
migrations.DeleteModel(name='OldFeature'),
],
database_operations=[
# If table has FKs to frequently-truncated tables (User, Team, Organization),
# drop those FK constraints to avoid blocking TransactionTestCase teardown.
# migrations.RunSQL(
# sql="ALTER TABLE posthog_oldfeature DROP CONSTRAINT IF EXISTS posthog_oldfeature_team_id_fkey",
# ),
],
),
]
Test infrastructure note: If your table has foreign keys pointing TO frequently-truncated tables like User, Team, or Organization, you may see test failures like cannot truncate a table referenced in a foreign key constraint. This happens because:
TransactionTestCase uses TRUNCATE to clean up between testsdatabase_operations (see commented example above) - you're dropping the table soon anywayStep 2: Wait for safety window
Step 3: Drop the table (optional)
RunSQL with raw SQL (see example below)Important notes:
RunSQL(DROP TABLE IF EXISTS) for explicit control and idempotency# ❌ DANGEROUS - Never do this
class Migration(migrations.Migration):
operations = [
migrations.DeleteModel(name='OldFeature'),
]
# ✅ SAFE - Multi-phase approach with SeparateDatabaseAndState
# Step 1: Remove model and all references (deploy this in one PR)
# - Delete all code that imports/uses OldFeature
# - Delete OldFeature class from models.py
# - Run makemigrations and wrap in SeparateDatabaseAndState
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
migrations.DeleteModel(name='OldFeature'),
],
database_operations=[
# Drop FK constraints if table references User/Team/Organization
# to avoid blocking TransactionTestCase TRUNCATE operations
],
),
]
# Step 2: Much later (weeks), optionally drop the table
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.RunSQL(
sql="DROP TABLE IF EXISTS posthog_oldfeature",
reverse_sql=migrations.RunSQL.noop,
),
]
Problem: RemoveField operations drop columns immediately. This breaks backwards compatibility during deployment and cannot be rolled back - once data is deleted, any rollback deployment will fail because the column no longer exists.
Use the same multi-phase pattern as Dropping Tables:
RemoveField in a later migrationImportant notes:
RemoveField operations are irreversible - column data is permanently deletedDROP COLUMN takes an ACCESS EXCLUSIVE lock (briefly) - schedule during low-traffic windowsProblem: RenameModel operations rename tables immediately. This breaks old code that still references the old table name during deployment.
Strongly recommended: Accept the original table name even if it's wrong. Renaming tables in production creates significant complexity and risk for minimal benefit. The table name is an implementation detail that users never see.
Problem: RenameField operations rename columns immediately. This breaks old code that still references the old column name during deployment.
Strongly recommended: Don't rename columns in production. Accept the original name and use Django's db_column parameter to map a better Python name to the existing database column:
class MyModel(models.Model):
better_name = models.CharField(db_column='old_bad_name', max_length=100)
This gives you a clean Python API without the risk of renaming the database column.
Problem: Adding a NOT NULL column without a default (or with a volatile default like uuid4() or now()) requires rewriting the entire table. This locks the table and can cause deployment timeouts.
Step 1: Add column as nullable
class Migration(migrations.Migration):
operations = [
migrations.AddField(
model_name='mymodel',
name='new_field',
field=models.CharField(max_length=100, null=True), # Allow NULL
),
]
Deploy this change.
Step 2: Backfill data for all rows
For small/medium tables with static values, use simple UPDATE:
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
sql="UPDATE mymodel SET new_field = 'default_value' WHERE new_field IS NULL",
),
]
For large tables, use batching to avoid long locks:
from django.db import migrations
def backfill_in_batches(apps, schema_editor):
MyModel = apps.get_model('myapp', 'MyModel')
batch_size = 10000
while True:
ids = list(
MyModel.objects.filter(new_field__isnull=True)
.values_list('id', flat=True)[:batch_size]
)
if not ids:
break
MyModel.objects.filter(id__in=ids).update(new_field='default_value')
class Migration(migrations.Migration):
operations = [
migrations.RunPython(backfill_in_batches),
]
Step 3: Add NOT NULL constraint
class Migration(migrations.Migration):
operations = [
migrations.AlterField(
model_name='mymodel',
name='new_field',
field=models.CharField(max_length=100, null=False), # Now NOT NULL
),
]
Or use RunSQL for more control:
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
sql="ALTER TABLE mymodel ALTER COLUMN new_field SET NOT NULL",
reverse_sql="ALTER TABLE mymodel ALTER COLUMN new_field DROP NOT NULL",
),
]
Problem: Creating indexes without CONCURRENTLY locks the table for the entire duration of index creation. On large tables, this can take minutes or hours, blocking all writes.
CREATE INDEX holds an exclusive lockRecommended: Use Django's built-in concurrent operations (PostgreSQL only):
from django.contrib.postgres.operations import AddIndexConcurrently
from django.db import migrations, models
class Migration(migrations.Migration):
atomic = False # Required for CONCURRENTLY
operations = [
AddIndexConcurrently(
model_name='mymodel',
index=models.Index(fields=['field_name'], name='mymodel_field_idx'),
),
]
If you need IF NOT EXISTS for idempotency, use RunSQL:
from django.db import migrations
class Migration(migrations.Migration):
atomic = False # Required for CONCURRENTLY
operations = [
migrations.RunSQL(
sql="""
CREATE INDEX CONCURRENTLY IF NOT EXISTS mymodel_field_idx
ON mymodel (field_name)
""",
reverse_sql="DROP INDEX CONCURRENTLY IF EXISTS mymodel_field_idx",
),
]
AddIndexConcurrently for existing large tables - it handles the SQL correctlyAddIndexConcurrently does not support IF NOT EXISTS - use RunSQL if you need idempotencyatomic = False in the migration (required for all CONCURRENTLY operations)RemoveIndexConcurrently to drop indexes safelyProblem: Adding constraints like CHECK or FOREIGN KEY validates all existing rows, locking the table during validation.
Add constraints in two phases - add without validation, then validate separately.
Example: CHECK Constraint
Step 1: Add constraint with NOT VALID
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
sql="""
ALTER TABLE mymodel
ADD CONSTRAINT mymodel_field_check
CHECK (field_value > 0)
NOT VALID
""",
reverse_sql="ALTER TABLE mymodel DROP CONSTRAINT mymodel_field_check",
),
]
This adds the constraint but only validates NEW rows (instant operation).
Step 2: Validate constraint in separate migration
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
sql="ALTER TABLE mymodel VALIDATE CONSTRAINT mymodel_field_check",
reverse_sql=migrations.RunSQL.noop,
),
]
Deploy this separately. Validation scans the table but uses SHARE UPDATE EXCLUSIVE lock which allows normal reads and writes but blocks other schema changes on that table.
Note: This pattern applies to FOREIGN KEY constraints on existing columns. New nullable FK columns don't need it - the column starts empty, so there's nothing to validate.
NOT VALID makes constraint addition instantVALIDATE CONSTRAINT takes a SHARE UPDATE EXCLUSIVE lock that allows normal reads/writes but blocks DDL operationsProblem: RunSQL with UPDATE or DELETE operations can lock rows for extended periods, especially on large tables. RunPython operations can be slow and hold locks.
Break large updates into small batches with delays between them.
Pattern 1: Batched UPDATE in RunSQL
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
sql="""
DO $$
DECLARE
batch_size INTEGER := 1000;
rows_updated INTEGER;
BEGIN
LOOP
UPDATE mymodel
SET new_field = 'value'
WHERE id IN (
SELECT id FROM mymodel
WHERE new_field IS NULL
LIMIT batch_size
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
PERFORM pg_sleep(0.1); -- Brief pause between batches
END LOOP;
END $$;
"""
),
]
Important: The DO $$ ... $$ batching runs inside a single transaction. Locks persist through the loop and partial progress cannot be committed. For truly chunked updates with intermediate commits, use Python-level batching (Pattern 2) or background jobs.
Pattern 2: Batched UPDATE in RunPython
from django.db import migrations
import time
def backfill_in_batches(apps, schema_editor):
MyModel = apps.get_model('myapp', 'MyModel')
batch_size = 10000
updated_count = 0
while True:
# Get batch of IDs that need updating
ids = list(
MyModel.objects.filter(new_field__isnull=True)
.values_list('id', flat=True)[:batch_size]
)
if not ids:
break
# Update batch
MyModel.objects.filter(id__in=ids).update(new_field='default_value')
updated_count += len(ids)
print(f"Updated {updated_count} rows...")
time.sleep(0.1) # Brief pause between batches
class Migration(migrations.Migration):
operations = [
migrations.RunPython(backfill_in_batches),
]
Pattern 3: Using iterator for memory efficiency
def process_large_dataset(apps, schema_editor):
MyModel = apps.get_model('myapp', 'MyModel')
# Use iterator to avoid loading all rows into memory
for obj in MyModel.objects.all().iterator(chunk_size=1000):
obj.new_field = calculate_value(obj)
obj.save(update_fields=['new_field'])
class Migration(migrations.Migration):
operations = [
migrations.RunPython(process_large_dataset),
]
Pattern 4: Bulk update for better performance
def bulk_update_in_batches(apps, schema_editor):
MyModel = apps.get_model('myapp', 'MyModel')
batch_size = 1000
objects_to_update = []
for obj in MyModel.objects.filter(needs_update=True).iterator(chunk_size=batch_size):
obj.new_field = calculate_value(obj)
objects_to_update.append(obj)
if len(objects_to_update) >= batch_size:
MyModel.objects.bulk_update(objects_to_update, ['new_field'])
objects_to_update = []
print(f"Updated batch of {batch_size} rows")
# Update remaining objects
if objects_to_update:
MyModel.objects.bulk_update(objects_to_update, ['new_field'])
class Migration(migrations.Migration):
operations = [
migrations.RunPython(bulk_update_in_batches),
]
.iterator(): Avoids loading all rows into memory.bulk_update(): Much faster than individual savesSeparateDatabaseAndState is a powerful Django operation that separates Django's migration state from actual database changes. This is essential for safe multi-phase deployments.
class Migration(migrations.Migration):
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
migrations.CreateModel(
name='ExistingTable',
fields=[
('id', models.BigAutoField(primary_key=True)),
('name', models.CharField(max_length=255)),
],
),
],
database_operations=[], # Table already exists, just update Django state
),
]
makemigrations may generate incorrect migrations trying to sync the stateSplit migrations with multiple risky operations into separate migrations. This makes rollback easier and reduces deployment risk.
# ❌ BAD - Multiple risky operations
class Migration(migrations.Migration):
operations = [
migrations.AddIndex(...), # Risky
migrations.RunSQL("UPDATE ..."), # Risky
migrations.AddField(...), # Risky
]
# ✅ GOOD - Separate migrations
class Migration(migrations.Migration):
operations = [
migrations.AddIndex(...),
]
PostgreSQL's CONCURRENTLY operations cannot run inside transactions. Use atomic=False only when required.
class Migration(migrations.Migration):
atomic = False # Required for CONCURRENTLY
operations = [
AddIndexConcurrently(
model_name="mymodel",
index=models.Index(fields=["field_name"], name="mymodel_field_idx"),
),
]
When to use atomic=False:
CREATE INDEX CONCURRENTLY (required - AddIndexConcurrently needs this)DROP INDEX CONCURRENTLY (required - RemoveIndexConcurrently needs this)REINDEX CONCURRENTLY (required)When NOT to use atomic=False:
Why this matters - the retry problem:
Our deployment uses bin/migrate with retry logic. If a migration fails mid-execution:
atomic=True (default): Nothing committed, retry works cleanlyatomic=False: Partial changes committed, retry fails with "column already exists" or similar errorsExample of what goes wrong:
Migration with atomic=False:
Op1: AddField (commits) ✓
Op2: AddField (lock_timeout, fails) ✗
Retry:
Op1: AddField → ERROR: column already exists!
If you need both schema changes AND concurrent index creation:
Split into separate migrations:
atomic=True, default)atomic=False)How atomic=False works:
With atomic=False, each operation in the migration runs in its own transaction and commits individually. This means:
For large data backfills: If you genuinely need atomic=False for long-running operations (not just CONCURRENTLY), ensure idempotency with IF NOT EXISTS, WHERE NOT EXISTS, or consider using async migrations instead.
Make operations safe to retry by using conditional SQL.
# Safe to run multiple times
migrations.RunSQL(
sql="CREATE INDEX CONCURRENTLY IF NOT EXISTS myindex ON mytable (field)",
)
migrations.RunSQL(
sql="DROP INDEX CONCURRENTLY IF EXISTS myindex",
)
Before deploying risky migrations:
DeleteModel, RemoveField are irreversible)SeparateDatabaseAndState for complex changesatomic=False, migrations may partially apply changes. If a migration fails, Django won't automatically roll back the changes. Always verify schema consistency after failed runs and be prepared to manually fix partial states.If you're unsure about a migration:
Remember: It's always safer to split a migration into multiple phases than to try to do everything at once.