Yes — migration tools can handle data migrations, but here’s the senior-level nuance:
They can, but you must be very careful about what kind of data migrations you put there.
This is a classic interview question about judgment, not tooling.
Short interview answer (what to say)
“Yes, tools like Flyway and Liquibase can run data migrations, but they are best suited for small, deterministic, idempotent data changes. Large, long-running, or business-logic-heavy data migrations should be handled outside schema migrations.”
That answer alone is already a strong signal.
What counts as a “good” data migration ✅
These are safe to put in Flyway / Liquibase:
1. Reference / lookup data
INSERT INTO role (code, name)
VALUES ('ADMIN', 'Administrator')
ON CONFLICT (code) DO NOTHING;
Examples:
- roles
- statuses
- enum-like tables
- config flags
2. Small backfills tied to schema change
UPDATE user
SET status = 'ACTIVE'
WHERE status IS NULL;
Rules:
- deterministic
- fast
- one-time
- tied to a specific schema version
3. Fixing incorrect historical data (surgical)
UPDATE order
SET total = subtotal + tax
WHERE total IS NULL;
Only if:
- row count is reasonable
- logic is trivial
- execution time is predictable
What you should NOT do ❌
1. Large table rewrites
UPDATE events SET payload = transform(payload);
-- millions of rows
Problems:
- long locks
- timeouts
- impossible rollback
- prod incident at 3 AM
2. Business logic migrations
Anything like:
- “recalculate balances”
- “derive state from history”
- “apply pricing rules”
These belong in application code, not migration tools.
3. Non-idempotent data changes
Bad example:
INSERT INTO audit_log (...)
If migration runs twice → duplicated data → corruption.
Flyway vs Liquibase for data migrations
Flyway
- SQL-based
- Very explicit
- Encourages forward-only data changes
- Forces you to think about idempotence
👉 Better discipline for data migrations.
Liquibase
- Supports:
- conditional execution
- rollback blocks
- checksums
- Tempting to overuse for complex logic
👉 More power → more chances to shoot yourself in the foot.
Senior rule of thumb (memorize this)
Schema migrations = migration tools
Data migrations = case by case
Business logic migrations = application code
Common production pattern (very senior answer)
Expand → Migrate → Contract
- Expand
- Add new columns / tables
- Migrate
- Background job / batch / script
- Controlled, observable, restartable
- Contract
- Drop old columns
Migration tools are usually involved only in Expand and Contract.
Typical interviewer follow-up trap
Q: “Why not do all data migrations in Flyway?”
Correct answer:
“Because migrations must be fast, safe, idempotent, and repeatable. Large data migrations violate these guarantees and increase deployment risk.”
How to rephrase cleanly in an interview (polished)
“Yes, migration tools support data migrations, but I limit them to small, deterministic updates. For large or business-critical data changes, I prefer controlled application-level migrations or background jobs to reduce deployment risk.”
That’s senior-level clarity.
What’s the main risk if i update a lot of data in migration ? broke deployment? so migration happens only during deployment ?
Main risk isn’t just “broken deployment”. The real risk is: your deployment becomes a database maintenance window — unpredictable, blocking, and hard to roll back.
Main risks of “updating a lot of data” in a migration
1) Deployment can stall or time out
- Most pipelines expect DB migrations to finish in seconds/minutes.
- A huge
UPDATEcan run for 30–120 minutes. - Result: pods keep restarting, health checks fail, rollout never completes.
2) Locks + blocked traffic (the killer)
Big updates often take row locks and can escalate impact:
- long-running transaction holds locks
- concurrent reads/writes queue behind it
- app starts returning 500/timeout
So it’s not only “deployment breaks” — production becomes slow or unavailable.
3) Replication lag / read replicas go stale
On Postgres/MySQL:
- heavy write generates lots of WAL/binlog
- replicas lag minutes/hours
- if you use read replicas, users see inconsistent data or errors.
4) Massive IO / autovacuum / bloat (Postgres)
Updating millions of rows:
- creates dead tuples (bloat)
- triggers vacuum pressure
- disk grows, cache churns
- performance degrades after deployment too.
5) Rollback becomes impossible
Even if you have a “down” script:
- undoing millions of changes is slow and risky
- you usually can’t safely restore previous values unless you stored them
So you’re stuck: forward-fix only.
6) Non-idempotent reruns = data corruption
If the migration re-runs (partial failure, retry, new environment):
- duplicate inserts
- double transformations
- inconsistent state
Does migration happen only during deployment?
Typically: yes — migrations are executed as part of the deployment/startup of the service.
But there are multiple common patterns:
Pattern A — App runs migrations on startup
- Service boots → runs Flyway/Liquibase → then starts serving traffic.
- Risk: if migration is slow, service can’t start.
Pattern B — CI/CD runs migrations as a separate step
- Pipeline runs migration job first, then deploys app.
- Better control, but still: slow migration = slow release.
Pattern C — Dedicated “migration job” (Kubernetes)
- A one-off Job runs migrations (with tight permissions), then app deploys.
- This is often the cleanest operationally.
So: not “only” during deployment, but in most setups it’s still release-coupled (it blocks the release).
The senior rule: keep migrations fast and predictable
What belongs in migration scripts:
- schema changes (DDL)
- small deterministic backfills (minutes max, ideally seconds)
- reference data inserts (idempotent)
What should NOT block deployment:
- migrating millions of rows
- recomputing business logic
- heavy transforms
What to do instead for “a lot of data”
Option 1 — Expand → Backfill (async) → Contract
- Expand: add new column/table (migration)
- Backfill: background job / batch script with small chunks (e.g., 10k rows)
- Dual-write for a while (app writes both old and new)
- Contract: drop old column (later migration)
Option 2 — Online backfill in chunks (safe DB approach)
UPDATE ... WHERE id > ? LIMIT ...style batching (or using ranges)- commit per batch
- monitor progress
- retry safe
If interviewer asks: “So what’s the main risk?”
Say this:
“The main risk is long-running migrations causing locks and unpredictable deployment time, turning a release into downtime. That’s why we keep migrations small and do big data moves via backfill jobs with batching and monitoring.”