Cloudflare has recently announced the open beta of Cloudflare Data Platform, a managed solution for ingesting, storing, and querying analytical data tables using open standards such as Apache Iceberg.
Earlier this year, Cloudflare announced the public beta of R2 Data Catalog, a managed Apache Iceberg catalog built on top of R2 object storage. Recently, the company combined Cloudflare Pipelines, R2 Data Catalog, and R2 SQL to form the Cloudflare Data Platform. Micah Wylde, principal engineer at Cloudflare, Alex Graham, senior systems engineer at Cloudflare, and Jérôme Schneider, staff software engineer at Cloudflare, explain:
Analytical data is critical for modern companies. It allows you to understand your users’ behavior, your company’s performance, and alerts you to issues. But traditional data infrastructure is expensive and hard to operate, requiring fixed cloud infrastructure and in-house expertise. We built the Cloudflare Data Platform to be easy enough for anyone to use with affordable, usage-based pricing.
Source: Cloudflare blog
Cloudflare Pipelines collects events sent through Workers or HTTP, processes them using SQL, and stores them either in Iceberg tables or as files on R2. R2 Data Catalog tracks the Iceberg metadata and now also handles regular maintenance tasks, such as compaction, to make queries faster. R2 SQL is a distributed serverless query engine for petabyte-scale datasets in R2. Micah Wylde, formerly co-founder and CEO of Arroyo, adds on LinkedIn:
Six months ago, Arroyo was acquired by Cloudflare. This provoked some confusion at the time — what did Cloudflare want with a stream processing engine? The answer: we’re building a data platform (…) The Cloudflare Developer Platform has enabled millions of developers to build, operate, and scale their apps by providing fully serverless infrastructure. The Cloudflare Data Platform takes the same approach to make analytical data infra available to everyone.
While SQL transformations are powerful for use cases such as schematizing and normalizing data or redacting sensitive information before storage, Pipelines currently supports only stateless transformations. In the future, Cloudflare plans to leverage more of Arroyo’s stateful processing capabilities to support aggregations, incrementally updated materialized views, and joins. Jamie Lord, solution architect at CDS UK, highlights one of the main advantages of the new platform, the standard Cloudflare “no egress fees” to access data:
Zero egress fees fundamentally changes the economics of data warehousing. Cloudflare’s new Data Platform leverages this advantage to challenge AWS and Google’s stranglehold on analytical workloads. The platform addresses a simple truth: companies are bleeding money on data transfer costs. A petabyte-scale operation might spend millions annually just moving data between regions for analysis. Cloudflare eliminates that entirely.
Joel Hatmaker, director of engineering at McGaw.io, comments:
If you’re already using Cloudflare for its performance and security features, the Cloudflare Data Platform starts to look really attractive.
Cloudflare claims that integration with Logpush, user-defined functions via Workers, and aggregations and joins in R2 SQL will follow in the first half of 2026.
A tutorial is available to create an end-to-end analytical data system using Pipelines, R2 Data Catalog, and R2 SQL. Pipelines, R2 Data Catalog, and R2 SQL are not billed during the open beta, but storage and operations incurred by queries are charged at standard rates.
