Java.DBMigrationTools.How do you deal with legacy databases in a migration tool?

1) First: decide what “legacy” means

Usually one of:

  • DB exists, no migration history table
  • DB exists, some scripts exist but not reliably reproducible
  • Multiple environments drifted (prod ≠ stage ≠ dev)

Your approach depends on which one it is.


2) Establish a trusted starting point (baseline / changelog sync)

Flyway

Use baseline when schema already exists:

  • Create flyway_schema_history
  • Insert a baseline marker (no SQL executed)
  • Start applying migrations after that point

Typical approach:

  • Tag today’s schema as baseline version, e.g. baselineVersion=100
  • New migrations start at V101__...

Use baselineOnMigrate only if you’re 100% sure you want auto-baseline when history is missing (I treat it as “careful with prod”).

Liquibase

Use changelog synchronization (conceptually like baseline):

  • Record changesets as executed without running them
  • Then start from “now”

(Exact mechanism varies: many teams use “mark ran” / sync approaches, plus liquibase validate to enforce structure.)

Key principle: you’re not “migrating the past”, you’re declaring current reality and managing future changes.

3) Capture the current schema in version control (snapshot)

Before you touch anything:

  • Produce a schema snapshot (DDL dump)
  • Store it in the repo as documentation (not necessarily replayed)
  • This is your “golden reference” if something goes sideways

Why it matters:

  • Legacy DBs often contain surprises (manual hotfixes, missing constraints, weird types)

4) Start with “safe” migrations only (Expand phase)

Your first few managed migrations should be low risk:

  • Additive changes only:
    • new table
    • new nullable column
    • new index (online where needed)
  • No renames/drops
  • No big backfills inside the migration tool

You’re proving the pipeline works.

5) Handle drift explicitly (don’t sweep it under the rug)

Detect drift

  • Run validation in every environment:
    • Flyway: validate (and keep validateOnMigrate=true)
    • Liquibase: validate + diff checks if used
  • CI: apply migrations to a clean DB (Testcontainers) + upgrade path tests

Fix drift the right way

  • If schema differs: create new migrations to converge
  • Don’t edit old applied migrations
  • Don’t “repair” your way out unless it’s purely formatting/checksum noise

6) Standardize the migration runtime (single runner)

Legacy systems often fail because multiple instances try to manage schema.

Best practice:

  • Migrations run in one place:
    • CI/CD step, or
    • a dedicated “migration job/pod”, or
    • one leader-elected instance
  • App instances start only after DB is at the required version

7) Make it safe for production operations

Legacy DBs usually have big tables and unknown load.

So you add rules:

  • Postgres: no CREATE INDEX on large tables without CONCURRENTLY
  • Split non-transactional DDL into separate migrations
  • Avoid long transactions / table rewrites
  • Schedule risky operations off-peak

8) Data migrations: do them outside (most of the time)

Legacy DBs often need cleanup/backfill.

Do:

  • schema change in migrations
  • backfill via an application job (idempotent, chunked, resumable)
  • only then enforce constraints / drop old columns (Contract)

9) Team discipline and governance

Legacy migration success is mostly process:

  • PR review checklist for migrations
  • naming/versioning rules
  • forbid manual DB changes (or require “hotfix -> backport migration” policy)
  • restricted DB permissions:
    • “migration user” vs “app user”

Interview-ready answer (30 seconds)

“For a legacy database, I first baseline/sync the current schema into the migration tool so it can manage changes going forward without replaying history. I snapshot the existing schema, then start with small additive migrations to prove the pipeline. I enforce validation in CI and all environments to detect drift, run migrations via a single controlled runner, and handle big data fixes via chunked jobs rather than long-running migration scripts. Over time we converge drift and adopt Expand–Migrate–Contract for safe evolution.”

This entry was posted in Без рубрики. Bookmark the permalink.