By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges
News

Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges

News Room
Last updated: 2026/03/02 at 12:25 PM
News Room Published 2 March 2026
Share
Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges
SHARE

Uber’s engineering team has transformed its data replication platform to move petabytes of data daily across hybrid cloud and on-premise data lakes, addressing scaling challenges caused by rapidly growing workloads. Built on Hadoop’s open-source Distcp framework, the platform now handles over one petabyte of daily replication and hundreds of thousands of jobs with improved speed, reliability, and observability, enabling analytics, machine learning, and disaster recovery at unprecedented scale.

Distcp is an open-source framework that copies large datasets in parallel across multiple nodes using Hadoop’s MapReduce. Files are split into blocks and assigned to Copy Mapper tasks running in YARN containers. The Resource Manager allocates resources, the Application Master monitors job execution and coordinates merges, and the Copy Committer assembles final files at the destination. Uber’s HiveSync team optimized this architecture for multi-petabyte workloads by moving preparation tasks to the Application Master, parallelizing listing and commit processes, and improving efficiency for small transfers.

HiveSync, originally based on Airbnb’s ReAir project, keeps Uber’s HDFS and cloud data lakes synchronized using bulk and incremental replication. For datasets larger than 256 MB, it submits Distcp jobs through asynchronous workers in parallel, with a monitoring thread tracking progress. As daily replication grew from 250 TB to over 1 PB and datasets expanded from 30,000 to 144,000, HiveSync faced backlogs that threatened SLAs, emphasizing the need for operational and architectural enhancements to support cloud migration and Uber’s active-passive data lake model.

HiveSync architecture: Data replication workflow using Distcp ( Source: Uber Blog Post)

To address scaling challenges, the HiveSync team enhanced Distcp by moving resource-intensive tasks like Copy Listing and Input Splitting from the HiveSync server to the Application Master, reducing HDFS client contention and cutting job submission latency by up to 90 percent. Copy Listing and Copy Committer tasks were parallelized, allowing multiple files to be processed simultaneously while maintaining block order, lowering p99 listing latency by 60 percent and maximum commit latency by over 97 percent. For smaller jobs transferring fewer than 200 files or 512 MB, Hadoop’s Uber job feature ran Copy Mapper tasks directly in the Application Master’s JVM, eliminating roughly 268,000 container launches daily and improving YARN efficiency.

More than 50% of Distcp jobs are assigned a single mapper each ( Source: Uber Blog Post)

These optimizations increased incremental replication capacity fivefold, enabling HiveSync to replicate over 300 PB during Uber’s on-premise-to-cloud migration without incidents. Enhanced observability, including job submission, Copy Listing, and Committer metrics, heap usage, and p99 copy rates, helped engineers monitor workloads and preempt failures. Out-of-memory errors, high job submissions, and long-running Copy Listing tasks were mitigated via stress testing, circuit breakers, optimized YARN configurations, and reordered task execution.

Looking ahead, the HiveSync team is focusing on further parallelization, optimized resource management, and network efficiency. Planned enhancements include parallelizing file permission setting and input splitting, moving compute-intensive commit tasks to the Reduce phase, and implementing a dynamic bandwidth throttler. Uber plans to contribute these improvements as an open-source patch, extending the broader community’s ability to manage extreme-scale hybrid cloud replication. Even small improvements can lead to significant gains at our scale, the engineering team noted. These efforts highlight the operational and engineering creativity required to sustain high-throughput, reliable performance across complex, multi-region data pipelines.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article How M-Pesa’s simple fix could stop thousands of scams How M-Pesa’s simple fix could stop thousands of scams
Next Article New Chrome Vulnerability Let Malicious Extensions Escalate Privileges via Gemini Panel New Chrome Vulnerability Let Malicious Extensions Escalate Privileges via Gemini Panel
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Decision Latency is Killing Your Cloud Budget. Agentic AI Can Fix it  | HackerNoon
Decision Latency is Killing Your Cloud Budget. Agentic AI Can Fix it | HackerNoon
Computing
I’ve taught thousands of people how to use AI – here’s what I’ve learned
I’ve taught thousands of people how to use AI – here’s what I’ve learned
News
LastPass vs. RoboForm: The Password Manager You Actually Need
LastPass vs. RoboForm: The Password Manager You Actually Need
News
Judge blocks Perplexity’s AI agents from shopping on Amazon
Judge blocks Perplexity’s AI agents from shopping on Amazon
News

You Might also Like

I’ve taught thousands of people how to use AI – here’s what I’ve learned
News

I’ve taught thousands of people how to use AI – here’s what I’ve learned

7 Min Read
LastPass vs. RoboForm: The Password Manager You Actually Need
News

LastPass vs. RoboForm: The Password Manager You Actually Need

8 Min Read
Judge blocks Perplexity’s AI agents from shopping on Amazon
News

Judge blocks Perplexity’s AI agents from shopping on Amazon

2 Min Read
Best earbud deal: Get  off this Samsung Galaxy Buds 4 Pro +  Amazon Gift Card bundle.
News

Best earbud deal: Get $30 off this Samsung Galaxy Buds 4 Pro + $30 Amazon Gift Card bundle.

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?