By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes
News

Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes

News Room
Last updated: 2026/02/26 at 10:20 AM
News Room Published 26 February 2026
Share
Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes
SHARE

Pinterest has launched a next-generation database ingestion framework to address the limitations of its legacy batch-based systems and improve real-time data availability. The previous infrastructure relied on multiple, independently maintained pipelines and full-table batch jobs, resulting in high latency, operational complexity, and inefficient resource utilization. Critical use cases, including analytics, machine learning, and product features, required faster, more reliable access to data.

The legacy system faced several key challenges. Data latency often exceeded 24 hours, delaying analytics and ML workflows. Daily changes for many tables were below 5%, yet full-table batch processes reprocessed unchanged records, wasting compute and storage resources. Row-level deletions were not natively supported, and operational fragmentation across pipelines caused inconsistent data quality and high maintenance overhead.

As emphasized by a Pinterest engineer,

A unified DB ingestion framework built on Change Data Capture (Debezium/TiCDC), Kafka, Flink, Spark, and Iceberg provides access to online database changes in minutes (not hours or days) while processing only changed records, resulting in significant infrastructure cost savings.

The framework is generic, supporting MySQL, TiDB, and KVStore, is configuration-driven for easy onboarding, and integrates monitoring with at-least-once delivery guarantees.

Next-gen database ingestion architecture overview (Source: Pinterest Blog Post)

The architecture separates CDC tables from base tables. CDC tables act as append-only ledgers, recording each change event with typical latency under five minutes. Base tables maintain a full historical snapshot, updated via Spark Merge Into operations every 15 minutes to an hour. Iceberg’s Merge Into operation provides two update strategies: Copy on Write(COW) and Merge on Read(MOR). Copy on Write rewrites entire data files during updates, increasing storage and compute overhead. Merge on Read writes changes to separate files and applies them at read time, reducing write amplification. After evaluating both strategies, Pinterest standardized on Merge on Read because Copy on Write introduced significantly higher storage costs that outweighed its benefits for most workloads. The selected approach enables incremental updates while keeping infrastructure costs manageable at the petabyte scale.

Spark jobs first deduplicate the latest changes from CDC tables and then apply updates or deletions to base tables. Historical data is loaded initially through a bootstrap pipeline, and ongoing maintenance jobs handle compaction and snapshot expiration.

Optimizations include partitioning base tables by a hash of the primary key using Iceberg bucketing, allowing Spark to parallelize upserts and reduce data scanned per operation. The framework also addresses the small files problem by instructing Spark to distribute writes by partition, reducing overhead caused by multiple small files per task.

Measured outcomes include reducing data availability latency from more than 24 hours to as low as 15 minutes, processing only the 5% of records that change daily, and lowering infrastructure costs by avoiding unnecessary full-table operations. The system handles petabyte-scale data across thousands of pipelines while supporting incremental updates and deletions.

Pinterest’s CDC-based ingestion framework delivers real-time access to database changes, with Iceberg tables on AWS S3 and Flink-Spark handling streaming and batch workloads. Future improvements will focus on automated schema evolution, safely propagating upstream changes downstream to enhance the reliability and maintainability of large-scale pipelines.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Fake UK government website redirect detection time reduced to eight days | Computer Weekly Fake UK government website redirect detection time reduced to eight days | Computer Weekly
Next Article Price as a Product: Dynamic Pricing With ML That Increases Revenue  | HackerNoon Price as a Product: Dynamic Pricing With ML That Increases Revenue | HackerNoon
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How to Set Social Media Goals in 2025 (+ Free Course)
How to Set Social Media Goals in 2025 (+ Free Course)
Computing
Mini-LED vs. OLED: Which TV Tech Should You Get?
Mini-LED vs. OLED: Which TV Tech Should You Get?
News
Mastering ChatGPT Prompts in 2026: The Practitioner’s Framework for Structured, High-Impact Prompting – Chat GPT AI Hub
Mastering ChatGPT Prompts in 2026: The Practitioner’s Framework for Structured, High-Impact Prompting – Chat GPT AI Hub
Computing
The Best Mini Desktops We’ve Tested for 2026
The Best Mini Desktops We’ve Tested for 2026
News

You Might also Like

Mini-LED vs. OLED: Which TV Tech Should You Get?
News

Mini-LED vs. OLED: Which TV Tech Should You Get?

8 Min Read
The Best Mini Desktops We’ve Tested for 2026
News

The Best Mini Desktops We’ve Tested for 2026

36 Min Read
Happy Birthday, iPad: Apple’s Tablet Turns 16
News

Happy Birthday, iPad: Apple’s Tablet Turns 16

2 Min Read
Bitcoin gets new expiration date thanks to Google researchers
News

Bitcoin gets new expiration date thanks to Google researchers

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?