NVIDIA just released CUDA 13.1 for what they claim is “the largest and most comprehensive update to the CUDA platform since it was invented two decades ago.” The most notable addition with the CUDA 13.1 release is CUDA Tile as a new tile-based programming model.
CUDA Tile brings a virtual ISA for tile-based parallel programming and at a higher-level than Single-Instruction, Multi-Thread (SIMT).
NVIDIA describes CUDA Tile as:
“With the evolution of computational workloads, especially in AI, tensors have become a fundamental data type. NVIDIA has developed specialized hardware to operate on tensors, such as NVIDIA Tensor Cores (TC) and NVIDIA Tensor Memory Accelerators (TMA), which are now integral to every new GPU architecture.
With more complex hardware, more software is needed to help harness these capabilities. CUDA Tile abstracts away tensor cores and their programming models so that code using CUDA Tile is compatible with current and future tensor core architectures.
Tile-based programming enables you to program your algorithm by specifying chunks of data, or tiles, and then defining the computations performed on those tiles. You don’t need to set how your algorithm is executed at an element-by-element level: the compiler and runtime will handle that for you.
…
The foundation of CUDA Tile is CUDA Tile IR (intermediate representation). CUDA Tile IR introduces a virtual instruction set that enables native programming of the hardware as tile operations. Developers can write higher-level code that is efficiently executed across multiple generations of GPUs with minimal changes.While NVIDIA Parallel Thread Execution (PTX) ensures portability for SIMT programs, CUDA Tile IR extends the CUDA platform with native support for tile-based programs. Developers focus on partitioning their data-parallel programs into tiles and tile blocks, letting CUDA Tile IR handle the mapping onto hardware resources such as threads, the memory hierarchy, and tensor cores.
By raising the level of abstraction, CUDA Tile IR enables users to build higher-level hardware-specific compilers, frameworks, and domain-specific languages (DSLs) for NVIDIA hardware. CUDA Tile IR for tile programming is analogous to PTX for SIMT programming.”
More details on CUDA Tile can be found at developer.nvidia.com.
The CUDA 13.1 announcement also notes of new Runtime API exposure of green contexts, emulation for double and single precisions within cuBLAS, and a completely rewritten CUDA programming guide as other highlights.
