Merged this week to the GNU C Library “glibc” code is dropping the ldbl-96 FMA implementation from this library as in doing so they found a 4x improvement to throughput and latency on AMD Zen 3 hardware.
Replacing Glibc’s long double implementation using 96-bit precision for internal calculations with the 64-bit double FMA implementation ended up netting a nice win for this widely-used libc implementation.
On “recent x86 hardware” the ldbl-64 implementation far outpaces the ldbl-96 code that has been removed from Glibc Git. In x86_64 benchmarks the throughput on AMD Zen 3 testing was 4.06x and for latency was also a 4.00x improvement. For i686 mode it was still a hefty 2.2~2.3x improvement.
The change to drop the ldbl-96 FMA implementation from Glibc’s math code happened with this commit now in Glibc Git.
This will in turn be released with Glibc 2.43 due for release in February. Glibc 2.43 also is bringing detection for newer CPUs, the MSEAL function, and other performance optimizations.
