Uber redesigned its MySQL infrastructure to improve cluster uptime, replacing external failover with MySQL Group Replication(MGR). Applied across thousands of clusters, the change reduces failover from minutes to seconds while maintaining strong consistency. The redesign began by introducing consensus replication to remove external dependencies, and was scaled fleet‑wide with automated onboarding, node management, rebalancing, and safeguards to ensure quorum and operational reliability.
Previously, Uber ran MySQL clusters in a single-primary, asynchronous replica model. External systems detected failures and promoted replicas, resulting in failover times measured in minutes. To reduce downtime and improve reliability, Uber adopted MySQL Group Replication, a Paxos-based consensus protocol. The new architecture embeds consensus within the database itself, forming a three-node MGR cluster. One node serves as primary for writes, while the other two secondaries participate in consensus without accepting direct writes. This ensures that all nodes maintain up-to-date data and can automatically elect a new primary if needed.
Uber Engineering emphasized in a LinkedIn post :
At Uber, high availability is non-negotiable.
Scalable read replicas fan out from the secondaries, separating read scaling from write availability while preserving fault tolerance. Flow control within MGR monitors transaction queues on each secondary and signals the primary to pause, or throttle writes as needed, preventing nodes from falling behind. This mechanism avoids replication inconsistencies, reduces write downtime during failover, and prevents errant GTIDs from propagating outside the cluster.
Architecture of a consensus-based MySQL cluster (Source: Uber Blog Post)
Benchmarking revealed trade-offs with the new design, showing a slight increase in write latency of hundreds of microseconds compared to asynchronous replication, but a dramatic reduction in total write unavailability during primary failures from minutes to under 10 seconds, including primary election and routing updates. Read latencies remained consistent since local replica performance matched the legacy model.
Uber scaled the architecture using an automated control plane for cluster onboarding, offboarding back to legacy replication, and rebalancing during topology changes. Workflows handle both graceful and ungraceful node replacements, and configurations such as group_replication_unreachable_majority_timeout and single-leader mode protect against split-brain and minority partitions. Automated topology health analysis dynamically adds new nodes when a group drops below quorum and removes excess nodes to reduce overhead. Downstream replicas are repointed during node deletion, with optional backlog application blocking to maintain strict external consistency.
Rebalance consensus cluster workflow (Source: Uber Blog Post)
Uber engineers say implementing MySQL Group Replication at scale revealed MGR plugin using performance_schema.memory/group_replication, and careful handling of group_replication_bootstrap_group to prevent split-brain scenarios. Single-primary mode was selected over multi-primary for simplicity and operational predictability, as multi-primary introduces higher conflict potential and requires robust conflict resolution for transactional ordering. The combination of consensus-based replication, automated workflows, and scalable read replicas enables high availability, strong consistency, and reduced manual intervention, providing a foundation for Uber’s fleet-scale MySQL infrastructure.
