The open-source and Rust-based Burn deep learning framework developed by Tracel AI shared that their open-source matrix multiplication kernel performance can compete with and even outperform the NVIDIA CUDA cuBLAS performance. Plus Burn isn’t limited to just NVIDIA GPUs but can work on most hardware/drivers, including a Vulkan back-end.
On Friday the Burn developers published a lengthy blog post going over their exciting MATMUL kernel performance relative to NVIDIA CUDA cuBLAS/CUTLASS and showing some really splendid results for this cross-platform, Rust open-source DL framework.
For those wanting to get straight to the exciting part:
“On CUDA, our Simple algorithm is remarkably fast and stable, nearly always outperforming the cuBLAS/CUTLASS reference. However, the MultiRow variant truly stands out in the end; it is also the top performer across the board on Vulkan.”
Some really enticing data. Those wanting to learn more about the Burn MATMUL kernel performance can see the Burn.dev blog post.
I haven’t looked at Burn previously until a Phoronix reader pointed it out but I’ll be checking out their open-source software for use in some possible future benchmarks, namely burn-bench.