A 2021 era patch for the GNU Compiler Collection (GCC) has been revived and discussed in recent days around simplifying the memcpy and memset inlining strategies when compiling code with the “-mtune=generic” option. The patch takes the approach during that generic tuning to try to avoid branches. In doing so, some nice performance benefits are observed in some benchmarks.
While too late for the GCC 15 compiler to be released as stable very soon (hopefully later this week), H.J. Lu of Intel’s compiler team has been working to resurrect this patch for improving the memcpy and memset behavior during common -mtune=generic targeting with GCC.
When compiling with “-march=x86-64 -O2 -mtune=generic” as commonly done by Linux distributions and other software vendors, there are some nice gains observed out of this patch. On an Intel Ice Lake system, the EEMBC CPU benchmark saw 13~14% improvement while the SPEC CPU 2017 numbers were flat. On an Intel Cascade Lake system the EEMBC benchmark went up by as much as 16%.
Meanwhile on an AMD Zen 3 system the EEMBC benchmark went up by as much as 30% in one benchmark and less of an impact in other tests with a few regressions.
This patch is still being discussed but the number so far are looking fairly positive. Given the widespread use of “-mtune=generic”, hopefully this patch will be in shape for upstreaming soon to the GNU Compiler Collection.