A late change to the AMDGPU LLVM compiler back-end that may help efforts particularly for the ROCm compute support on RDNA3 hardware is finally merging support for using true 16-bit instructions and registers on all RDNA3 GPUs.
Merged earlier this summer was [AMDGPU][True16] set true16 mode as default on gfx110x. As noted there for enabling the “True16” mode on all GFX110x RDNA3 GPUs:
“There are quite a number of changes being merged to enable the true16 mode on gfx11, and a set of tests are ran including lit test, cts test and performance test. We think it’s the time now to try turning this mode on as default. Please let me know if there are still concerns to be address before merging this. Thanks!”
GFX115x RDNA 3.5 GPUs were left out of that initial enablement due to bugs. But those issues are now addressed and merged yesterday was this commit for enabling True16 mode on all RDNA3 GPUs — including RDNA 3.5.
The “FeatureRealTrue16Insts” feature for this True16 mode is simply described as using true 16-bit registers with these AMD GPUs. No performance numbers were public noted in either of the LLVM merge requests for this enabling of the True16 mode for either the graphics performance or GPU compute and AI/ML. With the RADV Vulkan driver using Valve’s AMD ACO compiler and not AMDGPU LLVM, this True16 mode will primarily benefit ROCm / GPU compute.
AMD RDNA4/GFX12 still needs to enable True16 mode too but for that additional AMDGPU LLVM back-end work needs to be carried out for enabling this true 16-bit mode on those latest Radeon consumer GPUs.