Replication lag is the delay between when a change occurs on the primary database and when that same change is visible on a replica.
🔍 What Does That Mean?
Imagine this sequence:
- A user updates their profile in the primary database.
- The update is logged (e.g., in WAL or binary logs).
- The read replica eventually receives and applies the change.
If the replica takes, say, 5 seconds to apply the update, then:
- For those 5 seconds, queries on the replica show stale data.
- This delay is the replication lag.
🧠 Why Does Replication Lag Happen?
Cause | Description |
---|---|
Network latency | Delay in transmitting logs from primary to replica |
IO bottlenecks | Replica is slow to write/apply changes |
Large transactions | Big or frequent writes take time to replicate |
Resource contention | CPU, disk, or memory pressure on the replica |
Asynchronous replication | By design, the replica doesn’t block the primary, but may fall behind |
📏 How to Measure It
- PostgreSQL: Compare
pg_current_wal_lsn()
on the primary withpg_last_wal_replay_lsn()
on the replica - MySQL: Check
Seconds_Behind_Master
inSHOW SLAVE STATUS
- MongoDB: Use
rs.printSlaveReplicationInfo()
⚠️ Why It Matters
Risk | Explanation |
---|---|
Stale reads | Replica shows outdated data — confusing to users or apps |
Failover issues | If you promote a lagging replica, it may miss recent changes |
Data integrity | Applications relying on up-to-date reads may break |
✅ How to Reduce Replication Lag
- Optimize write performance on the primary
- Use faster disk and more memory on replicas
- Compress replication logs
- Monitor and alert when lag exceeds threshold
- Use synchronous replication if strong consistency is needed (but slower)
🔄 Summary
Term | Meaning |
---|---|
Replication Lag | Time delay between primary update and replica sync |
Caused By | Network, IO, large writes, async delay |
Result | Stale reads, delayed failovers |
Solutions | Optimize infra, monitor lag, use sync replication |