Parting The Clouds: The Rise Of Disaggregated Systems By Murat Demirbas At QCon SF 2025

Cloud computing is undergoing an architectural shift driven primarily by Cloud economics, according to Murat Demirbas’s recent session on disaggregated systems. Traditional cloud architectures, which couple compute and storage in a “shared-nothing” design, are proving inefficient and too rigid for the elasticity demands of modern applications.

Traditional “shared-nothing” cloud architectures, which tightly couple compute and storage, are facing an inherent impedance mismatch. As Demirbas explained, compute is a costly, volatile resource, while storage is cheap and stable. Coupling them leads to massive inefficiency, forcing customers to provision and pay for redundant or unnecessary resources, contradicting the cloud’s promise of pay-per-use elasticity.

According to Demirbas, the industry is rapidly adopting disaggregated architectures, decoupling compute, storage, and, increasingly, logging. Cloud databases such as Amazon Aurora, Google AlloyDB, Microsoft Socrates, and Snowflake exemplify this trend, enabling independent scaling and operational simplicity.

Key motivations for this change include:

Elastic Scalability: Compute nodes can scale up or down instantly without moving petabytes of data.

Fault Isolation: Failure in an isolated component (such as a compute node) enables faster failover.

Simplified Operations: Shared durable storage simplifies complex maintenance, replication, and backup routines.

Pay-Per-Use Models: True economic optimization is achieved when customers pay only for the exact compute cycles and storage blocks consumed.

This architectural shift has been enabled by advances in high-speed networking, particularly Remote Direct Memory Access (RDMA) and Compute Express Link (CXL), which provide the low-latency fabric needed to replace traditional local I/O.

Disaggregation fundamentally changes how core database tasks are executed, shifting complexity to the shared storage layer. In Amazon Aurora, for instance, the primary compute node pushes only the redo log (or write-ahead log) to a storage quorum. The storage nodes then materialize the database state, allowing the compute node to acknowledge writes quickly and shifting the burden of replication and consensus away from the compute cluster.

Demirbas connected these modern designs to classic distributed systems theory by invoking Leslie Lamport’s Paxos protocol. This framing suggests that disaggregation creates a new, natural separation of database roles:

Paxos Role	Database Function
Proposers	Compute Nodes (Initiating changes)
Acceptors	Shared Log/Storage Quorum (Ensuring durability)
Learners	Shared Page Store (Providing availability)

This structural separation is what allows these systems to scale, fail, and recover gracefully. However, it’s not without trade-offs: the performance bottleneck shifts from the CPU to the network, requiring mitigations like aggressive data buffering and prefetching to overcome the inherent slowness of remote I/O.

The disaggregation trend is pioneering new frontiers in database design:

Pushdown Computation: Executing complex query logic directly at the storage nodes (near the data) to minimize costly data movement over the network.

Memory Disaggregation: Separating the memory/buffer pool from compute nodes using CXL to enable elastic and independent memory scaling.

Unification of Workloads: Facilitating the merger of transactional (OLTP) and analytical (OLAP) workloads on shared storage, as seen in Google AlloyDB.

Demirbas concluded with a powerful quote from Ralph Waldo Emerson:

There are many methods, but few principles. If you master the principles, you can choose your methods.

He emphasized that, as new hardware and new failure modes, such as metastability in dynamic environments, emerge, a deep understanding of distributed systems principles will be crucial for designing the future of cloud data: a fabric array of databases that self-assemble and treat the data center as a giant computer.