Uber engineers have updated the CacheFront architecture to serve over 150 million reads per second while ensuring stronger consistency. The update addresses stale reads in latency-sensitive services and supports growing demand by introducing a new write-through consistency protocol, closer coordination with Docstore, and improvements to Uber’s Flux streaming system.
In the earlier CacheFront design, a high throughput of 40 million reads per second was achieved by deduplicating requests and caching frequently accessed keys close to application services. While effective for scalability, this model lacked robust end-to-end consistency, making it insufficient for workloads requiring the latest data. Cache invalidations relied on time-to-live (TTL) and change data capture (CDC), which introduced eventual consistency and delayed visibility of updates. This also created specific issues: in read-own-writes inconsistency, a row that is read, cached, and then updated might continue serving stale values until invalidated or expired. Negative caching (storing a “not-found” result) could return incorrect misses even after a row was inserted, potentially breaking service logic in read-own-inserts inconsistency.
Previous CacheFront read and write paths for invalidation (Source: Uber Engineering Blog Post)
The new implementation introduces a write-through consistency protocol along with a deduplication layer positioned between the query engine and Flux, Uber’s streaming update system. Each CacheFront node now validates data freshness with Docstore before serving responses. The storage engine layer includes tombstone markers for deleted rows and strictly monotonic timestamps for MySQL session allocation. These mechanisms allow the system to efficiently identify and read back all modified keys, including deletes, just before commit, ensuring that no stale data is served even under high load.
Improved CacheFront write paths & invalidation (Source: Uber Engineering Blog Post)
When a transaction completes, the storage engine returns both the commit timestamp and the set of affected row keys. A callback registered on these responses immediately invalidates any previously cached entries in Redis by writing invalidation markers. Flux continues tailing MySQL binlogs and performing asynchronous cache fills. Together, these three cache population mechanisms: direct query engine updates, invalidation markers, and TTL expirations, combined with Flux tailing, work in concert to maintain strong consistency while supporting extremely high read throughput.
Uber’s engineers, Preetham Narayanareddy & Eli Pozniansky, explained the motivation behind the improvement:
There was increasing demand for higher cache hit rates and stronger consistency guarantees. The eventual consistency of using TTL and CDC for cache invalidations became a limiting factor in some use cases.
Uber engineers, through this integration, were also able to deprecate and remove the dedicated API introduced earlier, reducing operational complexity and streamlining the system.
Uber engineers enhanced telemetry and observability dashboards to monitor cache health and real-time binlog tailing. Cache shards were reorganized to distribute load evenly. The Cache Inspector tool, built on the same CDC pipeline as Flux, compares binlog events to entries stored in the cache. These updates allowed TTLs for tables to be extended up to 24 hours, increasing the cache hit rate above 99.9 percent while maintaining low latency.