Java.DBMigrationTools.How do you debug a failed migration in a pipeline?

2) Check migration metadata tables (source of truth)

This tells you whether it ran, failed, or got stuck.

Flyway

Query flyway_schema_history:

  • success = false rows → failed migration
  • installed_rank / version to see order
  • checksum and execution_time for clues

Liquibase

Query:

  • DATABASECHANGELOG → last executed changeset, execType
  • DATABASECHANGELOGLOCK → is it locked and by whom?

Common pipeline failure: Liquibase lock left behind by a killed job → next run blocks.

3) Reproduce in the same environment the pipeline used

Pipeline failures often depend on:

  • DB version/params
  • privileges
  • data volume
  • statement timeouts
  • transaction settings

Repro checklist

  • Same Docker image / migration tool version (Flyway/Liquibase)
  • Same JDBC URL params
  • Same migration user/role
  • Same schema/search_path
  • Same baseline/placeholder values

If the pipeline spins an ephemeral DB: pull the same compose/Testcontainers config locally.

4) Classify the failure (most common buckets)

A) Syntax / compatibility

  • Wrong SQL dialect for the DB
  • Using non-transactional DDL inside a transaction (Postgres CREATE INDEX CONCURRENTLY)
    Fix: adjust SQL, split into separate migrations, or mark non-transactional properly.

B) Permission / ownership

  • “must be owner of relation”, “permission denied”
    Fix: run under proper migration role; ensure role owns objects or has required grants.

C) Locking / timeouts

  • “could not obtain lock”, “lock wait timeout”, deadlock
    Fix:
  • Make DDL less blocking (concurrent/online)
  • Increase lock_timeout / statement_timeout carefully
  • Run off-peak or use expand–migrate–contract
  • For Liquibase: clear stale lock (with tooling)

D) Data issues (DML fails)

  • constraint violations, nulls, duplicate keys
    Fix:
  • precondition checks (Liquibase preconditions)
  • backfill in batches
  • make migration idempotent / safe for rerun

E) “Half-applied” state

  • Some statements executed, then failure
    Fix approach depends on DB + tool:
  • If migration is transactional and failed → usually rolled back
  • If non-transactional statements were used → you may need fix-forward scripts

5) Safe recovery actions (what to do next)

Flyway

  • If a migration failed, Flyway records it as failed.
  • Typical recovery:
    1. Fix the migration or add a new fix migration (preferred if already applied elsewhere)
    2. If the failed row remains: flyway repair (after you’re sure about the state)
    3. Re-run migrate

Liquibase

  • If lock is stuck: use Liquibase commands (preferred) rather than manual DB edits:
    • liquibase releaseLocks (if your setup supports it)
    • or as a last resort: fix DATABASECHANGELOGLOCK carefully
  • If a changeset partially applied: usually fix-forward with a new changeset.

Rule: on shared envs, avoid “editing history”; prefer new migration.

6) Pipeline hardening (so debugging is rare)

Add a “preflight” stage:

  • validate (checksums, changelog correctness)
  • updateSQL / dryRunOutput artifact
  • run migrations on an ephemeral DB from scratch
  • optionally run against a restored snapshot nightly (big-data/perf catch)

Also ensure:

  • one migration runner (job/step) to avoid concurrency
  • DB connection + lock timeouts are explicit and logged
  • artifacts include: generated SQL, tool version, DB version

7) Interview-ready answer (tight)

“First I identify the exact failing migration/version from pipeline logs and check the schema history tables (flyway_schema_history or DATABASECHANGELOG/LOCK) to see whether it failed, partially applied, or left a lock. Then I reproduce using the same tool version, config, and DB role. Most issues fall into syntax/compatibility, permissions, or locking/timeouts; I fix-forward with a new migration when history is shared, and use Flyway repair or Liquibase lock release only when I’m sure it’s a metadata/lock problem. Finally I harden the pipeline with validate + dry-run SQL artifacts and an ephemeral DB migration test stage.”

This entry was posted in Без рубрики. Bookmark the permalink.