Uber has shared details of Ceilometer, an internal adaptive benchmarking framework designed to evaluate infrastructure performance beyond application-level metrics. The system helps Uber qualify new cloud SKUs, validate infrastructure changes, and measure efficiency initiatives using repeatable, production-like benchmarks. As infrastructure heterogeneity increases across cloud providers and hardware generations, Uber built Ceilometer to provide consistent, data-driven performance signals across environments.
At Uber scale, benchmarking infrastructure has historically been a fragmented and manual process. Engineers often relied on one-off scripts, isolated test runs, and spreadsheets to compare results, making it difficult to reproduce outcomes or correlate performance across teams. Ceilometer replaces this approach with a centralized platform that automates benchmark orchestration, execution, result ingestion, and analysis, enabling standardized comparisons across servers, workloads, and environments.
Ceilometer is architected as a distributed system that coordinates benchmark execution across dedicated machines. Tests are executed in parallel to reflect realistic workload behavior, with raw outputs stored in durable blob storage. Results are validated, normalized, and ingested into Uber’s centralized data warehouse, where they can be queried and analyzed alongside production metrics. According to Uber engineers, this design allowed them to identify performance regressions, configuration inefficiencies, or hardware-level differences using a consistent data model.
Ceilometer architecture diagram (Source: Uber Blog Post)
The framework supports a wide range of workload types. Synthetic benchmarks such as SpecCPU2017, SPECjbb2015, NetPerf, and FIO are used to characterize CPU, memory, network, and storage performance. For stateful systems, Ceilometer integrates with Uber’s Odin platform to benchmark database workloads under realistic conditions. Stateless services can be evaluated using Uber’s Ballast framework, which provides adaptive load testing to simulate production traffic patterns.
One primary use case is server shape and cloud SKU qualification. Hardware vendors and cloud providers can run Ceilometer’s benchmark suites in their own environments and share results, allowing Uber to assess expected performance before onboarding new SKUs. Another key use case is infrastructure change validation, where targeted benchmarks help isolate regressions introduced by software upgrades, kernel changes, firmware updates, or configuration tuning. Ceilometer’s results can be compared across time, environments, and workload types, giving engineers a clearer picture of how infrastructure changes affect system behavior beyond surface-level application metrics.
As described by Nav Kankani, a Senior Engineering Manager and Platform Architect on the Uber Infrastructure team, on his LinkedIn post:
Uber’s adaptive benchmarking framework models our production workloads to guide cloud platform decisions. By representing diverse workloads across Stateless, Stateful, Batch, and AI/ML domains, Ceilometer makes HW/SW co-design exploration seamless and drives efficiency gains across our fleet.
As Uber’s infrastructure continues to evolve, Ceilometer is expanding to provide deeper, more proactive insights. Planned enhancements include AI and machine learning integration to predict regressions, identify root causes, and optimize resource sizing; broader ecosystem support for emerging technology stacks and infrastructure paradigms; advanced anomaly detection to surface unexpected performance deviations more rapidly; and component-level utilization metrics to offer granular visibility into CPU, memory, storage, and network behavior. Uber engineers also plan to leverage the framework for canary continuous validation testing, enabling automated, recurring benchmark runs that alert teams when performance thresholds are breached, supporting faster and more reliable infrastructure decisions.
