Java.DBMigrationTools.Give me an example where undo fails when data changed

Scenario: adding a status column with default + app starts writing it

Step 1 — Migration V12__add_status.sql

ALTER TABLE orders ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'NEW';
CREATE INDEX idx_orders_status ON orders(status);

Step 2 — App release (same deploy window)

New app version starts doing:

  • writes status='PAID' | 'CANCELLED'
  • queries: WHERE status IN ('PAID','NEW')
  • Kafka consumers / background jobs may update status asynchronously

Step 3 — Something goes wrong

Maybe:

  • latency spike due to index build
  • bug in app release
  • unexpected query plan regression

Team says: “Rollback!”


Flyway undo attempt: U12__drop_status.sql

DROP INDEX idx_orders_status;
ALTER TABLE orders DROP COLUMN status;

What fails (and how)

✅ A) You lose data (the obvious failure)

While the new app was live for 10 minutes, it wrote:

order_id=101  status=PAID
order_id=102  status=CANCELLED

Undo drops the column → those statuses are gone forever.

Now old app version comes back up and assumes:

  • “if no status column, we infer state from payments table”
    But that inference might not match reality anymore (because the new app already used status as the source of truth).

Result:

  • reports wrong
  • downstream systems inconsistent
  • reconciliation nightmare

This is the most common “undo fails” case: it “succeeds” technically, but business state is corrupted.

❌ B) The old app still breaks (schema mismatch during rollout)

Even if you drop the column successfully, you likely had:

  • multiple instances
  • rolling deploy

So some pods still run the new code briefly and will crash:

SELECT ... WHERE status = 'NEW';
-- ERROR: column "status" does not exist

You now have:

  • failing pods
  • retries
  • message consumers reprocessing
  • possibly duplicate side effects

Undo causes runtime failure under mixed-version traffic.


❌ C) Background workers keep writing after undo (race condition)

Even if you revert the main API pods, a background consumer might still be running new code for a bit.

It tries:

UPDATE orders SET status='PAID' WHERE id=101;
-- ERROR: column "status" does not exist

Now you get:

  • stuck Kafka consumer group
  • growing lag
  • repeated failures
  • potentially “poison messages”

Undo didn’t restore a stable system — it created a new incident.

A cleaner “forward rollback” fix (what seniors do)

Instead of undo, you keep the column and make the system compatible:

V13__make_status_nullable_and_stop_using_it.sql

ALTER TABLE orders ALTER COLUMN status DROP NOT NULL;
-- optionally drop index if it caused perf issues
DROP INDEX IF EXISTS idx_orders_status;

Then deploy app hotfix that:

  • stops using status
  • tolerates it being null / ignored

Later (after things are stable) you can plan a safe cleanup migration.

How to say it in an interview (2–3 sentences)

Undo fails when the new version has already written data in the new schema — dropping a column loses business state and you can’t reconstruct it reliably.
Also in rolling deployments you can’t guarantee all nodes stop using the new schema at the same moment, so undo creates runtime errors and broken consumers.
That’s why we use forward-only migrations and “forward rollbacks” that restore compatibility instead of reverting schema.

This entry was posted in Без рубрики. Bookmark the permalink.