Scenario: adding a status column with default + app starts writing it
Step 1 — Migration V12__add_status.sql
ALTER TABLE orders ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'NEW';
CREATE INDEX idx_orders_status ON orders(status);
Step 2 — App release (same deploy window)
New app version starts doing:
- writes
status='PAID' | 'CANCELLED' - queries:
WHERE status IN ('PAID','NEW') - Kafka consumers / background jobs may update status asynchronously
Step 3 — Something goes wrong
Maybe:
- latency spike due to index build
- bug in app release
- unexpected query plan regression
Team says: “Rollback!”
Flyway undo attempt: U12__drop_status.sql
DROP INDEX idx_orders_status;
ALTER TABLE orders DROP COLUMN status;
What fails (and how)
✅ A) You lose data (the obvious failure)
While the new app was live for 10 minutes, it wrote:
order_id=101 status=PAID
order_id=102 status=CANCELLED
Undo drops the column → those statuses are gone forever.
Now old app version comes back up and assumes:
- “if no status column, we infer state from payments table”
But that inference might not match reality anymore (because the new app already used status as the source of truth).
Result:
- reports wrong
- downstream systems inconsistent
- reconciliation nightmare
This is the most common “undo fails” case: it “succeeds” technically, but business state is corrupted.
❌ B) The old app still breaks (schema mismatch during rollout)
Even if you drop the column successfully, you likely had:
- multiple instances
- rolling deploy
So some pods still run the new code briefly and will crash:
SELECT ... WHERE status = 'NEW';
-- ERROR: column "status" does not exist
You now have:
- failing pods
- retries
- message consumers reprocessing
- possibly duplicate side effects
Undo causes runtime failure under mixed-version traffic.
❌ C) Background workers keep writing after undo (race condition)
Even if you revert the main API pods, a background consumer might still be running new code for a bit.
It tries:
UPDATE orders SET status='PAID' WHERE id=101;
-- ERROR: column "status" does not exist
Now you get:
- stuck Kafka consumer group
- growing lag
- repeated failures
- potentially “poison messages”
Undo didn’t restore a stable system — it created a new incident.
A cleaner “forward rollback” fix (what seniors do)
Instead of undo, you keep the column and make the system compatible:
V13__make_status_nullable_and_stop_using_it.sql
ALTER TABLE orders ALTER COLUMN status DROP NOT NULL;
-- optionally drop index if it caused perf issues
DROP INDEX IF EXISTS idx_orders_status;
Then deploy app hotfix that:
- stops using
status - tolerates it being null / ignored
Later (after things are stable) you can plan a safe cleanup migration.
How to say it in an interview (2–3 sentences)
Undo fails when the new version has already written data in the new schema — dropping a column loses business state and you can’t reconstruct it reliably.
Also in rolling deployments you can’t guarantee all nodes stop using the new schema at the same moment, so undo creates runtime errors and broken consumers.
That’s why we use forward-only migrations and “forward rollbacks” that restore compatibility instead of reverting schema.