Database.Middle.How does replication work in databases?

Database replication is the process of copying and maintaining database objects and data from one database (the primary/master) to one or more others (the replicas/slaves) to ensure redundancy, high availability, scalability, and disaster recovery.


🧠 Core Idea

One database acts as the source of truth, and changes (inserts, updates, deletes) are replicated to other databases.

[ Primary DB ]
     |
     |  changes streamed
     v
[ Replica DBs ]

🧱 Types of Replication

1. Master-Slave (Primary-Replica)

  • Writes go to the master, reads can go to slaves.
  • Replicas are read-only copies.
  • Used for read scaling and backups.

2. Multi-Master

  • All nodes can read and write.
  • Requires conflict resolution.
  • Used in distributed systems needing high availability across regions.

3. Synchronous vs. Asynchronous

TypeDescriptionProsCons
SynchronousMaster waits for replica to confirm writeStrong consistencySlower, higher latency
AsynchronousMaster does not waitFast write performancePossible data lag on failure

⚙️ Common Replication Mechanisms

Statement-Based Replication

  • Replicates SQL statements (e.g., UPDATE users SET ...)
  • MySQL supports this (older approach)

Row-Based Replication

  • Replicates actual row data changes (e.g., before/after values)
  • More accurate, avoids non-determinism

Logical Replication

  • Sends logical changes (insert/update/delete) — not raw bytes
  • PostgreSQL: pglogical, wal2json

Physical Replication

  • Copies data blocks (pages) directly from the primary (e.g., PostgreSQL WAL)
  • Exact binary-level copy — faster but not flexible

🔧 Example: PostgreSQL Streaming Replication (Physical)

  1. Master writes changes to WAL (Write-Ahead Log)
  2. Replica reads WAL via streaming connection
  3. Applies changes to replicate state
# On replica
pg_basebackup -h master_host -D /var/lib/postgresql/data -U replicator --wal-method=stream

📦 Benefits of Replication

BenefitDescription
High AvailabilityFailover to replica if master goes down
Read ScalabilitySpread read load across replicas
Disaster RecoveryUse replicas as hot/cold backups
Geo-DistributionDeploy replicas near users globally

❌ Challenges

  • Replication lag (in async mode)
  • Conflict resolution (multi-master)
  • Data consistency and durability
  • Monitoring and failover management

🛠️ Tools and Systems with Built-in Replication

SystemBuilt-in Replication Type
PostgreSQLStreaming (physical), logical
MySQLBinary log (row-based/statement)
MongoDBReplica sets
CassandraMulti-master, eventual consistency
CockroachDBSynchronous, distributed SQL engine

✅ Summary

ConceptDescription
ReplicationCopying data from one DB node to others
Master-SlaveOne writeable node, multiple read-only nodes
SynchronousSafer but slower
AsynchronousFaster, but can lag
UsesHA, scalability, backups, failover