A significant update to Burn was released today, the MIT and Apache 2.0 licensed tensor library and deep learning framework written in the Rust programming language. Burn 0.20 brings some low-level changes as it continues to strive to deliver high performance AI across the diverse hardware ecosystem.
With Burn 0.20 they have introduced CubeK, which provides high performance multi-platform kernels written in CubeCL. CubeCL in turn is Tracel AI’s multi-platform compute language extension for Rust. CubeCL is focused on programming GPUs in Rust with “zero-cost abstractions” and in turn supports execution on NVIDIA CUDA, AMD ROCm HIP, Apple Metal, WebGPU, and Vulkan. Plus CPU-based execution too with SIMD support for most processors.
They hope with CubeK on CubeCL they will be able to deliver “peak performance on diverse hardware”, as they summed up in their announcement on GitHub:
“This release marks a major turning point for the ecosystem with the introduction of CubeK. Our goal was to solve a classic challenge in deep learning: achieving peak performance on diverse hardware without maintaining fragmented codebases.
By unifying CPU and GPU kernels through CubeCL, we’ve managed to squeeze maximum efficiency out of everything from NVIDIA Blackwell GPUs to standard consumer CPUs.
Beyond performance, this release makes the library more robust, flexible, and significantly easier to debug.
This release also features a complete overhaul of the ONNX import system, providing broader support for a wide range of ONNX models. In addition, various bug fixes and new tensor operations enhance stability and usability.”
Via the Burn.dev blog they shared more details on Burn 0.20 with their work on unifying CPU and GPU kernels. Included in that blog post were also some benchmark results show much lower execution times than the likes of LibTorch and ndarray:
Interesting work and it will be interesting to see how Burn and CubeK/CubeCL evolves and what sort of developer uptake there is around these Rust-based solutions.
