By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Agoda Handles Kafka Consumer Failover Across Data Centers with Custom Two-Way Sync
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Agoda Handles Kafka Consumer Failover Across Data Centers with Custom Two-Way Sync
News

Agoda Handles Kafka Consumer Failover Across Data Centers with Custom Two-Way Sync

News Room
Last updated: 2025/08/13 at 7:40 AM
News Room Published 13 August 2025
Share
SHARE

Agoda’s engineering team recently shared their custom solution designed to maintain critical Kafka consumer operations across multiple on-premise data centers, ensuring business continuity even during outages. Processing over 3 trillion Kafka records daily, Agoda needed a failover mechanism that could seamlessly shift consumer workloads between distinct Kafka clusters while preserving processing state and avoiding data duplication or loss.

Rather than relying on Kafka’s stretch clusters, which proved impractical due to geographic latency, or MirrorMaker 2, which lacks bidirectional offset synchronization, Agoda engineers developed an enhanced system that extends MirrorMaker 2 to support reliable failover, seamless failback, and persistent offset translation. Their approach involves always-on, two-way synchronization of consumer group offsets and OffsetSync records between clusters. When a consumer group commits an offset in one data center, that offset is translated and updated in the other cluster using a custom synchronization service built around Kafka Connect and OffsetSync mechanisms.

In failover scenarios, the secondary cluster seamlessly takes over processing from the exact point consumed in the original location, thanks to the translated and replicated offsets. When the primary data center returns, the system supports failback: consumer offsets are synchronized back to the original cluster, ensuring continuity without duplicating messages or losing progress . To avoid cyclic offset updates, the sync service checks for already-in-sync states before applying updates.

The system also includes strong observability components: dedicated Grafana dashboards track metrics such as replication delays, sync failures, and consumer lag to detect anomalies early and intervene before operational impact occurs. This real-time visibility supports reliability across the multi-data-center Kafka deployment.

The custom failover and failback architecture reflects a growing trend where organizations engineering at a multi-DC scale cannot rely on default Kafka features. According to Agoda, this system provides the necessary resilience for service continuity, precise processing semantics, and disaster recovery capabilities at scale. This approach highlights a strategic commitment to operational rigor and design flexibility, enabling Agoda’s data platform to withstand infrastructure outages without compromising correctness or throughput.

On Instagram, Agoda posted about their Kafka infrastructure handling over three trillion records per day and noted:

“To maintain business continuity during data center outages, we must be able to shift Kafka consumers across clusters.”

Other companies with large-scale streaming platforms have tackled multi-data center Kafka failover challenges in ways that share similarities with Agoda’s custom solution, though their implementations differ depending on operational constraints and priorities:/p>

MirrorMaker 2 is Kafka’s built-in tool for cross-cluster replication. By default, it supports unidirectional replication, copying data from a primary cluster to a secondary cluster. While this works for active-passive failover scenarios, it lacks native support for bidirectional offset synchronization and seamless failback. Without custom extensions, MM2 cannot translate consumer offsets across mirrored topics, which means consumers would have to reprocess messages or risk missing data after failover.

Netflix runs multi-regional active-active systems built on Kafka for events and microservices communication. Their solution utilizes custom tooling on top of MirrorMaker (pre-MM2) to replicate data between regions. For failover, Netflix integrates with its control plane (Zuul, Eureka) to redirect traffic and resume consumers in alternate regions. While Agoda’s solution automates offset translation, Netflix historically prioritized idempotent event handling and replay to handle failback scenarios.

Uber’s streaming stack (uChannel, Apache Kafka) supports global services like Uber Eats and ride dispatch. They implement geo-distributed replication but often rely on asynchronous failover models. Similar to Agoda, Uber avoids cross-region synchronous clusters and uses local offsets with checkpointing to resume consumption in disaster recovery scenarios. Their model emphasizes partitioning workloads by geography and replaying from checkpoints during failback rather than continuous bidirectional sync.

Confluent’s commercial tool extends MM2 with more advanced capabilities, such as better topic auto-creation and schema replication. However, like MM2, it doesn’t inherently solve offset translation for bidirectional failover. Additionally, enterprises may face vendor lock-in and licensing costs when adopting Confluent’s solution.

Agoda’s design requires ongoing offset synchronization and observability tooling (Grafana dashboards for sync lag and failure rates). This adds complexity compared to simpler unidirectional DR setups but provides higher reliability and correctness for critical, high-volume workloads. Other companies that prioritize cost and simplicity might accept some replay or manual failback instead of building custom solutions.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Circle posts strong quarterly growth, but shares drop on proposed stock sale – News
Next Article Global Unicorn Count Tops 1,600, With 13 Additions In July
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Twelve South’s 120W charger with Apple Find My support is nearly half off
News
How to Make Money Streaming Video Games with BasicallyIDoWrk
Computing
Google Search Now Lets You Prioritize Your Preferred News Sources
News
‘War of the Worlds’ Isn’t Just Bad. It’s Also Shameless Tech Propaganda
Gadget

You Might also Like

News

Twelve South’s 120W charger with Apple Find My support is nearly half off

2 Min Read
News

Google Search Now Lets You Prioritize Your Preferred News Sources

4 Min Read
News

What the UK’s ransomware crackdown signals for Europe | Computer Weekly

8 Min Read
News

The TP-Link Portable Travel Router is down to a record-low price at Amazon

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?