Java.DBMigrationTools.How do you perform a zero-downtime deployment using migration tools?

for zero-downtime with Flyway/Liquibase the tool is not the magic — the migration strategy is. The standard interview answer is Expand → Migrate → Contract (aka parallel change), plus operational guardrails.


Core pattern: Expand → Migrate → Contract

1) Expand (backward compatible DB change)

Goal: deploy DB changes that old app + new app can both live with.

Typical actions:

  • Add new nullable columns / tables
  • Add new indexes (concurrently where supported)
  • Add triggers / views to keep old/new in sync (optional)
  • Add new constraints as NOT VALID / disabled first (Postgres) and validate later

Rules:

  • Never drop/rename stuff the old version still uses
  • Avoid locking operations during peak (e.g., big table ALTERs)

Example (rename column safely):

  • Add new_col
  • Keep old_col for now
  • (Optional) trigger to mirror writes
  • Release app that writes both

2) Migrate (data backfill + dual-write / dual-read)

Goal: move data gradually while both versions run.

Approaches:

  • Backfill in chunks (job/batch), not in a single huge migration
  • Application does:
    • dual-write (write to old + new)
    • read-new-fallback-old (or the opposite), depending on risk

Operational tips:

  • Make backfill idempotent
  • Track progress (marker table / cursor / timestamps)
  • Throttle to protect DB (sleep, limited batch size)

3) Contract (remove old stuff)

Only when:

  • All nodes are on the new version
  • Backfill is complete
  • Monitoring confirms no reads/writes to old paths

Actions:

  • Drop old columns/tables
  • Remove triggers
  • Enforce constraints fully (NOT NULL, FK validation)
  • Cleanup indexes

This step is often a separate release days later.


Tooling: how Flyway/Liquibase fits

Use migrations for schema, not heavy backfills

Best practice:

  • Schema changes in Flyway/Liquibase migrations
  • Large data backfills in a controlled job (app job, one-off worker, or admin service)

Why:

  • Deploy pipelines time out
  • DB locks / long transactions
  • Harder to retry safely
  • Rollback is messy

Small data fixes are OK in migrations if fast + idempotent.


Concurrency-safe DDL practices (especially Postgres)

  • Create indexes with minimal locking:
    • Postgres: CREATE INDEX CONCURRENTLY ... (note: non-transactional)
  • Add constraints in phases:
    • ADD CONSTRAINT ... NOT VALID then VALIDATE CONSTRAINT
  • Prefer additive changes:
    • add column/table first, enforce later

If you have a DB that treats many ALTERs as table rebuilds (e.g., some MySQL ops), you plan around that: online DDL / pt-online-schema-change / gh-ost, etc.


Deployment sequencing (what actually happens)

A safe rollout often looks like:

  1. Deploy DB Expand migration (Flyway/Liquibase)
  2. Deploy app vNext (compatible with old schema)
  3. Run backfill job (can be continuous)
  4. Flip feature flag to read new path
  5. Observe
  6. Contract migration later (drop old)

With Kubernetes:

  • ensure old + new pods overlap (rolling update)
  • readiness probes must fail fast on incompatible schema (but schema is compatible, so OK)
  • avoid “migration at startup on every pod” unless you’ve designed for single-run (leader lock)

Guardrails you should mention in interviews

1) One writer for migrations

Avoid multiple instances applying migrations simultaneously:

  • Flyway has locking via its schema history table
  • Liquibase uses DATABASECHANGELOGLOCK
    Still: decide where it runs (CI/CD step, init container, or one pod).

2) Idempotency and retries

  • Make backfill resumable
  • Design migrations to be restartable where possible

3) Observability

Track:

  • backfill progress
  • query error rates (old/new)
  • DB locks & slow queries
  • replication lag (if any)

4) Feature flags

Use flags for:

  • dual-write enablement
  • read switch
  • contract cleanup timing

Interview-ready 30-second answer

“For zero-downtime, I use Expand–Migrate–Contract. First I apply backward-compatible schema changes (additive, no drops/renames), then deploy code that can work with both schemas and do dual-write / safe read strategies. Large backfills run as resumable jobs, not inside the migration tool. Once all instances are on the new version and data is migrated, I do a final contract migration to remove old columns and enforce constraints.”

This entry was posted in Без рубрики. Bookmark the permalink.