OpenBLAS 0.3.30 released this morning as the newest version of this optimized BLAS (Basic Linear Algebra Subprograms) library for multiple CPU architectures.
OpenBLAS 0.3.30 ships a number of general fixes, including to address some performance regressions. There is also better detection for LLVM’s modern Flang “flang-new” Fortran compiler. Plus more improvements around workload partitioning in parallel GEMM implementations.
OpenBLAS 0.3.30 also ships a number of x86_64 specific fixes, CPU auto-detection for newer Intel Arrow Lake CPU models, and fixing the MinGW build with the GCC 15 compiler.
Over for the ARM64 specific work, OpenBLAS 0.3.30 has improved CPU type detection, initial support for AmpereOne (Ampere-1A) processors, optimized SBGEMM kernel for Arm Neoverse-V1 CPUs, and a variety of other performance improvements. Apple M4 CPUs also now have correct CPU core type and cache size detection.
OpenBLAS 0.3.30 also brings performance improvements for RISC-V processors as well as a number of fixes. LoongArch64 is also enjoying some performance work in this release.
Downloads and more details on the OpenBLAS 0.3.30 changes via GitHub.