Earlier this month we spotted the addition of a new GFX1170 GPU target in the AMDGPU LLVM back-end. Making this GFX1170 target interesting is that its marked as an APU/SoC part with “RDNA 4m” while being part of the GFX11 series. The GFX11 series is for RDNA3, GFX115x is for RDNA 3.5, and GFX12 is RDNA4. More ISA changes have now been committed to the AMDGPU LLVM back-end that make a few more instruction differences better aligned with RDNA4.
The GFX1170 target in the LLVM AMDGPU code began with just a few feature differences over existing GFX3 targets, most notably was FP8/BF8 conversion support added. In the time since that initial activity, more differences are now indicated in the latest LLVM Git code.
Merged today are new WMMA and SWMMAC instructions for GFX1170 hardware. That code added new WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions to better separate from the existing WMMA256bInsts for GFX11 WMMA instructions that don’t apply to the GFX1170 series. The Wave Matrix Multiply Accumulate (WMMA) ISA improvements to better match RDNA4/GFX12 should help with AI/ML and other GPU compute workloads.
Also merged today is dropping V_DOT2ACC_F32_F16 from GFX1170 series. V_DOT2ACC_F32_F16 is used for the dot product of packed FP16 values, accumulate with destination. RDNA4 dropped V_DOT2ACC_F32_F16 while GFX1170 is also removing this instruction.
Another open merge request that was opened a few hours ago drops DX10_CLAMP and IEEE bits from the GFX1170 series. The “amdgpu-ieee” mode is to specify whether the function expects the IEEE field of the mode register to be set on entry. The “amdgpu-dx10-clamp” specifies whether the DX10_CLAMP field of the mode register is to be set on entry for matching DirectX 10 behavior around Not a Number “NaN” in the vector ALU. RDNA4 already dropped DX10_CLAMP while GFX1170 is now shifting away too.
Interesting to see more of these ISA changes coming about for GFX1170 to better push it closer to RDNA4 than RDNA3 and making more sense as “RDNA 4m” rather than just marketing speak. We still don’t know though what APUs/SoCs are ultimately expected to have this RDNA 4m graphics IP.
