By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: From Hadoop to Kubernetes: Pinterest’s Scalable Spark Architecture on AWS EKS
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > From Hadoop to Kubernetes: Pinterest’s Scalable Spark Architecture on AWS EKS
News

From Hadoop to Kubernetes: Pinterest’s Scalable Spark Architecture on AWS EKS

News Room
Last updated: 2025/07/28 at 3:09 PM
News Room Published 28 July 2025
Share
SHARE

Pinterest recently replaced its Hadoop-based data platform with Moka, a Kubernetes-native system running Spark on AWS EKS. Moka enables containerized job isolation, supports ARM-based instances, improves scheduling via YuniKorn, and simplifies deployment, while reducing infrastructure costs and increasing efficiency across data processing workloads.

Pinterest made a strategic decision to transition from a legacy Hadoop-based architecture to a Spark-on-Kubernetes model, better aligning with modern infrastructure practices. It chose Kubernetes for its native support for container orchestration and security, as well as its flexibility in deploying on mixed instance types, such as ARM and x86:

Armed with these requirements, we performed a comprehensive evaluation of running Spark on various platforms during 2022. We leaned towards Kubernetes-focused frameworks for the following advantages they offered: Container-based isolation and security as first-class platform citizens, ease of deployment, built-in frameworks, and performance tuning options.

In addition, Moka introduced key cost and efficiency improvements over the legacy platform. By leveraging container-based isolation, Pinterest consolidated workloads with different security requirements onto shared clusters, thereby reducing the need for multiple clusters.

Pinterest’s engineers also acknowledge that the “greater isolation provided by a container-based system allowed removal of dedicated yet underutilized Hadoop environments in favor of running jobs with differing security requirements on the same Moka cluster.” The platform’s support for ARM-based instances and opportunistic autoscaling, scaling up clusters during off-peak hours, further contributed to infrastructure cost savings.

Replacing Hadoop required re-engineering several critical components tied to job submission, scheduling, storage, and observability – “Over the years, Hadoop and Monarch [Pinterest’s Hadoop platform] have come to encompass a tremendous amount of functionality. Building an alternative implies developing replacements…”. Pinterest developed new services, such as Archer for job submission, adopted Apache YuniKorn for queue-based scheduling, migrated storage from HDFS to S3, and integrated the Apache Celeborn Remote Shuffle Service to maintain performance at scale.


Initial Moka High Level Design (Source)

In Moka’s initial design, Spinner, Pinterest’s Airflow-based orchestration system, breaks down scheduled workflows into individual job submissions and sends them to Archer, the EKS job submission service. Archer translates each job into a Kubernetes custom resource and submits it to a Spark-enabled EKS cluster. Archer handles job queuing, status tracking, and integration with the Kubernetes API, enabling reliable deployment and efficient resource routing across clusters while maintaining compatibility with existing workflows.



Spark Operator (Source)

Pinterest’s engineers chose to utilize Spark Operator for native execution of Spark on Kubernetes and Apache YuniKorn for batch scheduling. The Spark Operator exposes the SparkApplication Custom Resource Definition (CRD), allowing for the declarative definition of Spark applications and leaving the Spark Operator to handle all the underlying submission details. Internally, Spark Operator still utilizes the native spark-submit command.



Moka Resource Management (Source)

YuniKorn offers queue-based scheduling, application quotas, and preemption, and enables Pinterest to enforce resource isolation across teams and dynamically prioritize jobs based on workload tiers and business criticality.

Once YuniKorn schedules jobs, SparkSQL jobs connect to the Hive Metastore, and workloads are executed using container images from AWS ECR. During execution, Archer tracks job status, and the system uploads logs to S3 and metrics to internal dashboards. Users can access running job UIs via network proxies and retrieve historical logs through the Spark History Server, all of which are surfaced via the read-only Moka UI.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Instruct and openai alienated
Next Article Linux 6.17 Preps Many Networking Changes From Broadcom 800G To More WiFi 7
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Akira Ransomware Exploits SonicWall VPNs in Likely Zero-Day Attack on Fully-Patched Devices
Computing
You can get a refurbished Apple MacBook Air for just $200 — really
News
Apple overcomes Trump’s trade war, slow start in AI to deliver surprisingly strong quarter
News
Samsung TV glitch locks users out of apps worldwide (Update: Resolved)
News

You Might also Like

News

You can get a refurbished Apple MacBook Air for just $200 — really

2 Min Read

Apple overcomes Trump’s trade war, slow start in AI to deliver surprisingly strong quarter

6 Min Read
News

Samsung TV glitch locks users out of apps worldwide (Update: Resolved)

3 Min Read
News

macOS Tahoe Review: Spotlight Shines, Liquid Glass Disappoints

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?