For those looking for a speedy Basic Linear Algebra Subprograms “BLAS” library, OpenBLAS 0.3.31 is now available for this optimized open-source implementation.
OpenBLAS 0.3.31 brings BFloat16 extensions for BGEMM and BGEMV, other new BLAS extensions, problem size threshold for multi-threading with different kernels, improved Fortran compiler auto-detection, and a number of CMake build system fixes for different platforms from Windows to FreeBSD.
OpenBLAS 0.3.31 like most of their releases also has a number of new CPU-specific performance optimizations. There are a variety of new RISC-V performance optimizations in OpenBLAS for ZVL128B and ZVL256B targets as well as better RISC-V RVV 1.0 detection. ARM64 has also seen a number of multi-threading performance improvements and other new performance optimizations. There is also auto-detection now for Apple M SoCs on Linux as well as AmpereOne processors.
OpenBLAS 0.3.31 for x86_64 brings CPU auto-detection support for the Intel Core Ultra 200V “Lunar Lake” processors plus various fixes.
Downloads and more details on OpenBLAS 0.3.31 via GitHub and OpenBLAS.net.
