Java.DBMigrationTools.How do you create migrations for read replicas or sharded systems?

In read-replica or sharded setups, the “migration problem” isn’t writing SQL — it’s orchestration + safety: where you run it, in what order, how you handle heterogeneity, and how you keep the app compatible during the rollout.

Read replicas

Core rule

Run schema migrations on the primary only. Replicas get DDL via replication (physical or logical, depending on DB). Your job is to ensure the migration is replication-safe and doesn’t break reads while it’s propagating.

Practical strategy

  • Expand → Deploy → Backfill → Switch reads → Contract
    • Expand: add nullable columns / new tables / additive indexes
    • Deploy: code that can handle both old+new schema
    • Backfill: do heavy writes in batches (throttle)
    • Switch reads: if needed, start reading new fields once backfill is complete and replicas caught up
    • Contract: drop old columns/indexes later

Replica lag aware rollout

  • Before enabling new read paths:
    • ensure replicas have replayed the migration (monitor replication lag)
  • If your app reads from replicas:
    • guard new reads behind a feature flag until lag is acceptable
    • or “read-your-writes” path: route reads to primary for operations that depend on just-written schema/data

DDL choices that won’t wreck replicas

  • Prefer non-blocking operations (DB-specific):
    • Postgres: CREATE INDEX CONCURRENTLY (but not in a transaction), careful with long-running DDL
    • MySQL: online DDL options if supported
  • Avoid huge table rewrites during peak hours; they can create replication delay and cascade into stale reads.

Monitoring checks (must-have)

  • “Migration applied on primary” (schema history table)
  • Replica replay status / lag
  • Errors on replicas (DDL replication conflicts are rare with physical replication, more likely with logical decoding / filtered replication)

Sharded systems

Here the “primary-only” rule becomes primary-per-shard, and orchestration becomes the whole game.

1) Treat each shard as its own database with its own migration state

You need:

  • Per-shard schema history (Flyway/Liquibase tables exist on every shard)
  • A migration orchestrator that can:
    • discover shards
    • apply migrations shard-by-shard
    • report progress and failures
    • retry safely

2) Order of operations (typical)

Canary → small batch → full rollout

  • Run migrations on 1 shard (or a dedicated canary shard)
  • Then N shards at a time (control blast radius)
  • Then the rest

Throttle concurrency to protect:

  • DB CPU/IO
  • shared infrastructure (storage, network)
  • downstream systems (replication, CDC, analytics)

3) Handle heterogeneous shard versions

In real life, shards drift. Your application must be compatible with:

  • shard at version V
  • shard at V+1 (during rollout)
  • sometimes V+2 (if rollouts overlap)

So you design migrations and code with backward/forward compatibility:

  • Additive changes first (expand)
  • Reads tolerate missing columns (or use fallback)
  • Writes dual-write if needed
  • Contract only after all shards are confirmed upgraded

4) Data migrations in sharded environments

Avoid “update the whole shard in one transaction”.

  • Backfill in small batches with checkpoints:
    • WHERE id > last_id ORDER BY id LIMIT 10k
  • Make it idempotent (rerunnable)
  • Use rate limiting and time windows

5) “Global” constraints are not global anymore

Uniqueness, sequences, and FKs across shards are tricky:

  • Global uniqueness typically requires:
    • snowflake/UUID IDs
    • or keyspace allocation per shard
  • Cross-shard FKs: usually avoided; enforced at app level or via async validation

6) Tooling patterns

  • Flyway/Liquibase still work — but you don’t run them once; you run them per shard.
  • Common orchestrators:
    • pipeline step that iterates shards
    • K8s Job per shard (or per shard batch)
    • internal “Schema Service” (more mature orgs)

7) Failure strategy (what interviewers love)

  • If shard 17 fails:
    • stop the rollout
    • keep app compatible with mixed versions
    • fix migration
    • re-run only failed shards (idempotent migrations make this boring)

One clean “interview answer” you can say

“For read replicas, migrations run on the primary and replicate; the main concern is making schema changes additive and lag-aware so replica reads don’t break while the change propagates. For sharded systems, each shard has its own migration state, and we run migrations via an orchestrator with canary + batched rollout, keeping the application backward/forward compatible across mixed shard versions using an expand–migrate–contract approach and idempotent backfills.”

This entry was posted in Без рубрики. Bookmark the permalink.