Profile-guided optimization (PGO) has emerged as a powerful technique for improving application performance by using runtime data to inform compiler decisions. In a recent implementation, Uber collaborated with Google to integrate PGO into Golang, resulting in significant performance improvements and resource savings across their service fleet.
PGO takes advantage of actual runtime behavior to make smarter compiler decisions compared to traditional static analysis. By collecting execution profiles during representative runs, PGO identifies hot code paths and optimizes them accordingly through techniques like:
- Intelligent function inlining based on call frequency
- Improved code and data layout for better locality
- Enhanced register allocation and instruction scheduling
- Basic block reordering to optimize execution paths
Uber profiling infrastructure for Go
The implementation of PGO at Uber encompasses several key phases: profiling, analysis, and recompilation. Initially, runtime profiling data is collected during representative executions of applications. This data is then analyzed to identify optimization opportunities, which are subsequently applied during the recompilation process to produce optimized binaries. While languages like C++, Rust, Java, and Swift have long supported PGO, its integration into Go is relatively recent. Uber collaborated with Google to introduce PGO support in Go, with PGO-driven inlining introduced in version 1.20 and devirtualization optimizations added in version 1.21.
To seamlessly incorporate PGO into its continuous optimization framework, Uber established a systematic process:
- Daily Profile Collection: Continuous profiling data is gathered from multiple instances to create representative profiles.
- Service-Specific Enrollment: A configuration system enrolls specific services for PGO, ensuring targeted optimizations.
- Continuous Integration (CI) Testing: The PGO Software Development Kit (SDK) undergoes CI tests to validate changes and maintain stability.
- Deployment: Post-validation, PGO-optimized services are deployed into the production environment.
- Performance Monitoring: A performance dashboard monitors the impact of PGO on services, facilitating ongoing assessment.
A significant challenge encountered during PGO implementation was the increase in build times, with some services experiencing delays of up to eight times. This was primarily due to the extensive time required for parsing profiling data during compilation. To address this, Uber developed a profile preprocessing tool that extracts runtime profiling data, constructs call graphs, and caches this information for use during compilation. This preprocessing significantly reduced build times, making PGO integration more practical for developers.
The performance impact of PGO was evaluated using synthetic benchmarks and real-world service assessments. In the case of the widely used third-party JSON library in Go, known as go-json
, benchmarks demonstrated that:
- 30% reduction in instruction Translation Lookaside Buffer (iTLB) misses for Go’s go-json library.
- 4% performance improvement through optimized inlining.
- 24,000 fewer CPU cores required across Uber’s top services, leading to significant cost savings.
Several other technology companies have embraced Profile-Guided Optimization (PGO) to enhance the performance of their Go applications. Cloudflare has integrated PGO into its Go-based services, collecting CPU profiles from production environments to guide compiler optimizations, resulting in reduced CPU usage and improved performance. Similarly, Datadog employs PGO to optimize Go applications, achieving up to 14% CPU savings in production environments by leveraging profiling data to inform the Go compiler’s optimization decisions. Grafana Labs utilizes PGO in conjunction with Grafana Pyroscope, an open-source continuous profiling platform, to optimize Go applications. This integration provides real-time performance analysis, enabling developers to identify inefficiencies and optimize code execution for peak performance.
In summary, Uber’s integration of Profile-Guided Optimization into its Go programming environment has yielded substantial performance improvements. By systematically collecting profiling data, preprocessing it to reduce build times, and applying targeted compiler optimizations, Uber has enhanced the efficiency of its services. This initiative not only demonstrates the potential of PGO in optimizing resource utilization but also highlights the benefits of collaboration and innovation in software performance engineering.