Google engineer Eric Biggers is known for some of his great crypto performance optimization patches to benefit the Linux kernel and his most recent patch series is yielding some very tantalizing results for AMD Zen 5 processors whether it be the Ryzen 9000 series, Ryzen AI 300 series, or EPYC 9005 server processors.
Biggers has made numerous performance optimizations to the Linux kernel’s crypto subsystem for benefiting modern Intel and AMD CPUs. This has included AVX-512/AVX10 optimized code paths as well as making use of VAES and other new x86_64 CPU instruction set extensions to speed-up different algorithms within the Linux kernel.
On top of all the recent upstream improvements, on Tuesday night he posted his latest work rewriting the AES-CTR and AES-XCTR code to be better optimized for modern x86_64 CPUs.
Biggers commented on one of the patches:
“This greatly improves the performance of AES-CTR and AES-XCTR on VAES-capable CPUs, with the best case being AMD Zen 5 where an over 230% increase in throughput (i.e. over 3.3x faster) is seen on long messages. Performance on older CPUs remains about the same. There are some slight regressions (less than 10%) on some short message lengths on some CPUs; these are difficult to avoid, given how the previous code was so heavily unrolled by message length, and they are not particularly important.”
His numbers are pretty wild for AMD Zen 5 with those numbers having been obtained from a Ryzen 9 9950X desktop:
In another patch he also goes ahead and drops the non-AVX implementation of AES-CTR due to the vast majority of Intel/AMD CPUs with AES-NI also supporting AVX. The exceptions come down to Westmere server processors and Silvermont / Goldmont / Tremont cores. The non-AVX x86_64 SIMD assembly code is deemed a “major burden” and thus being dropped given its limited scope of processors where that code path is utilized.
See this patch series for the newest code under review.