Merged today for the upcoming GCC 15 stable release is a new “X86_TUNE_AVX512_TWO_EPILOGUES” tuning optimization that is enabled by default for AMD Zen 4 and Zen 5 processors.
SUSE compiler engineer Richard Biener wrote the patch adding this “X86_TUNE_AVX512_TWO_EPILOGUES” tuning and its default enabling when targeting either AMD Zen 4 or AMD Zen 5 processors. Biener explains in the now committed patch:
“The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512 vectorized loops when set. The tuning is enabled by default for Zen4 and Zen5 where I benchmarked it to be overall positive on SPEC CPU 2017 both in performance and overall code size. In particular it speeds up 525.x264_r which with only an AVX2 epilogue ends up in unvectorized code at the moment.”
No firm numbers from SPEC CPU 2017 or any other benchmarks were shared for helping to quantify the actual performance impact of this additional AMD Zen 5/4 tuning.
With the patch now in Git it will be part of the upcoming GCC 15.1 stable release due out in the early months of 2025.