Database.Advanced.How would you handle database failover?

✅ 1. Understand the Goals

RequirementDescription
⏱️ High availabilityKeep the application running with minimal disruption
💾 Data integrityAvoid data loss (use sync replication if needed)
🔄 Automatic switchDetect failure and switch to standby quickly
🔁 FailbackOptionally return to the original primary after recovery

🧱 2. Choose the Right Failover Setup

🧭 Replication Strategy

TypePrimary UseNotes
SynchronousStrong consistencyMay increase write latency
AsynchronousBetter performancePossible data loss
Semi-syncMiddle groundAcknowledged after 1 replica gets the write

🛠️ 3. Implementing Failover – Key Components

🔍 A. Health Checks & Monitoring

  • Regularly check if the primary is reachable.
  • Tools: Keepalived, Patroni, Consul, pg_auto_failover, custom scripts

🧠 B. Failover Manager / Orchestrator

Automatically promotes a replica if the primary fails.

DBToolFunction
PostgreSQLPatroni, pg_auto_failoverAutomatic leader election
MySQLMHA, OrchestratorMonitor & promote
MongoDBBuilt-inReplica sets handle this
Cloud DBsAWS RDS, GCP SQLManaged failover

📦 C. Virtual IP or Proxy Layer

  • So clients don’t need to know which DB is active
  • Use tools like HAProxy, pgpool-II, or ProxySQL
  • These reroute traffic to the active node

💽 D. Application Logic

  • Use retry logic and connection pooling
  • Support failover-aware drivers (e.g., JDBC with multiple hosts)

💡 4. Example: PostgreSQL + Patroni + etcd + HAProxy

[App] ⇄ [HAProxy]
            ⇄ Primary (Postgres Node 1)
            ⇄ Replica (Postgres Node 2)

[Patroni + etcd cluster]
 ⇨ Monitors nodes, triggers failover, updates HAProxy

🔁 5. Manual Failover (Fallback Plan)

If automation fails:

  1. Promote a replica manually:
pg_ctl promote /var/lib/postgresql/data

Update connection strings or proxy config.

Restart apps if needed.

Resync the old primary as a new replica.

📋 6. Test Regularly

  • Simulate failure (kill primary node, cut network).
  • Measure failover time and check data consistency.
  • Have runbooks for manual failover.

🧠 Summary

Database failover is a critical part of high availability.
It requires replication, health checks, automatic promotion, and application resilience. The goal is to recover fast with minimal data loss.

This entry was posted in Без рубрики. Bookmark the permalink.