By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: LinkedIn Re-Architects Service Discovery: Replacing Zookeeper with Kafka and xDS at Scale
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > LinkedIn Re-Architects Service Discovery: Replacing Zookeeper with Kafka and xDS at Scale
News

LinkedIn Re-Architects Service Discovery: Replacing Zookeeper with Kafka and xDS at Scale

News Room
Last updated: 2026/02/05 at 12:04 AM
News Room Published 5 February 2026
Share
LinkedIn Re-Architects Service Discovery: Replacing Zookeeper with Kafka and xDS at Scale
SHARE

In a recent LinkedIn Engineering Blog post, Bohan Yang describes the project to upgrade the company’s legacy ZooKeeper-based service discovery platform. Facing imminent capacity limits with thousands of microservices, LinkedIn needed a more scalable architecture. The new system leverages Apache Kafka for writes and the xDS protocol for reads, enabling eventual consistency and allowing non-Java clients to participate as first-class citizens. To ensure stability, the team implemented a “Dual Mode” strategy that allowed for an incremental, zero-downtime migration.

The team identified critical scaling problems with the legacy Apache ZooKeeper-based system. Direct writes from app servers and direct reads/watches from clients meant that large application deployments caused massive write spikes and subsequent “read storms,” leading to high latency and session timeouts. Additionally, since ZooKeeper enforces strong consistency (strict ordering), a backlog in read requests could block writes, causing healthy nodes to fail health checks. The team estimated that the current system would reach its maximum capacity in 2025.

To address these shortcomings, a new architecture was developed that moved from strong consistency to an eventual consistency model, providing better performance, availability, and scalability. The new system separates the write path (via Kafka) from the read path (via an Observer service). The Service Discovery Observer consumes Kafka events to update its in-memory cache and pushes updates to clients via the xDS protocol, which is compatible with Envoy and gRPC. The use of the xDS standard enables LinkedIn to deploy clients in many languages beyond Java. This adoption also enables future integration with Service Mesh (Envoy) and centralized load balancing.

Post-upgrade benchmarks showed that a single Observer instance can maintain 40k client streams and process 10k updates per second. Observers operate independently per data center (fabric) but allow clients to connect to remote Observers for failover or cross-data center traffic.

The migration had to occur without interrupting billions of daily requests or requiring manual changes from thousands of app owners. The team implemented Dual Read and Write mechanisms. For reads, clients subscribed to both ZooKeeper and the new Observer. ZooKeeper remained the Source of Truth for traffic routing during the pilot phase of a client system migration, while background threads verified the accuracy of Observer data against ZooKeeper data before switching traffic over. For writes, app servers announced their presence to both ZooKeeper and Kafka simultaneously. Automated cron jobs analyzed ZooKeeper watchers to identify “long-tail” legacy clients preventing the decommissioning of ZooKeeper writes.

After implementing the new service, data propagation latency improved significantly, dropping from P50 < 10s / P99 < 30s to P50 < 1s / P99 < 5s. The system now supports hundreds of thousands of app instances per data center with horizontal scalability via the Observer layer.


 

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article NVIDIA may launch new export-compliant AI chip samples in June · TechNode NVIDIA may launch new export-compliant AI chip samples in June · TechNode
Next Article Linux 7.0 Should Fix Nouveau For The Large Pages Support For Better NVK Performance Linux 7.0 Should Fix Nouveau For The Large Pages Support For Better NVK Performance
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Apple Just Completely Changed How iOS Apps Are Made – BGR
Apple Just Completely Changed How iOS Apps Are Made – BGR
News
iPad shipments jump in Q4 2025, as memory constraints loom – 9to5Mac
iPad shipments jump in Q4 2025, as memory constraints loom – 9to5Mac
News
The Google Pixel 10a breaks cover early in blue
The Google Pixel 10a breaks cover early in blue
News
My review of the best 5 Hootsuite alternatives
My review of the best 5 Hootsuite alternatives
Computing

You Might also Like

Apple Just Completely Changed How iOS Apps Are Made – BGR
News

Apple Just Completely Changed How iOS Apps Are Made – BGR

3 Min Read
iPad shipments jump in Q4 2025, as memory constraints loom – 9to5Mac
News

iPad shipments jump in Q4 2025, as memory constraints loom – 9to5Mac

2 Min Read
The Google Pixel 10a breaks cover early in blue
News

The Google Pixel 10a breaks cover early in blue

3 Min Read
What Is Skimo, the New Sport at the 2026 Winter Olympics?
News

What Is Skimo, the New Sport at the 2026 Winter Olympics?

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?