OpenBLAS 0.3.29 is out today as a big update for this widely-used, open-source implementation for Basic Linear Algebra Subprograms and LAPACK APIs.
OpenBLAS 0.3.29 brings improved thread scaling for multi-threaded SBGEMV and TRTRI, various multi-threaded fixes, improved documentation, and other general fixes.
When it comes to CPU/platform-specific work, there is initial support for detecting Apple M4 SoCs, various ARM64 performance optimizations, a number of x86_64 improvements, improved CGEMM and ZGEMM kernels for POWER10, many LoongArch 64-bit improvements, and some tuning/optimizations for RISC-V.
On the x86_64 side for OpenBLAS 0.3.29 there is CPU auto-detection for Intel Granite Rapids processors, auto-detection for AMD Zen 5 series processors, optimized SOMATCOPY_CT for AVX-capable targets, and a variety of other fixes/optimizations.
Downloads and more details on OpenBLAS 0.3.29 via GitHub.