The GNU C Library’s tanh hyperbolic function is now 4~14% faster on modern Intel and AMD CPUs with the FMA instruction support for fused multiply-add operations.
The FMA instruction set has been around for roughly the past decade with both Intel and AMD processors. Only now this week thanks to the work of Intel engineer Sunil K Pandey is there an FMA-optimized tanh function.
Testing of the FMA’ed tanh on an Intel Skylake CPU is showing a max improvement around 14% while the min/mean improvement clocks in around 4% faster than the prior code. Not bad for something that’s been commonplace among Intel/AMD x86_64 CPUs for years though surprising it took this long for the optimization to be cracked. In any event, Intel continues to deserve kudos for all their open-source toolchain optimizations over the years and especially when it comes to tuning the GNU C Library (glibc) for new x86_64 instruction set capabilities.
Those interested can find the FMA optimized tanh function via this Glibc commit. This improvement will be found in the GNU C Library 2.42 release due out this summer.