Arm software engineer Peter Waller has shared some insightful benchmarks of the impact of PGO, Context Sensitive PGO (CSPGO), and BOLT optimizations across various classes of Neoverse processor designs.
Peter Waller shared a current look at the performance benefits to Profile Guided Optimizations (PGO) and the BOLT binary layout optimizations contributed by Meta/Facebook to upstream LLVM. We often look at fascinating compiler performance optimization means on AMD and Intel x86_64 hardware given having much more interesting AMD/Intel hardware around here than other architectures, so this shared performance data is a rather interesting look for the Arm Neoverse world:
Indeed some very nice speed-ups from leveraging PGO and BOLT compiler optimizations across Neoverse N1 / N2 / V1 / V2. PGO and BOLT can be very beneficial for increasing the performance but do rely on having accurate profiles/traces so the compiler can make informed choices.
More details on these Arm performance claims for PGO and BOLT via LLVM Discourse.