Intel’s open-source “ANV” Vulkan driver for Linux systems enabled a new feature called BTP+BTI RCC Keying. You may be wondering what it means or stands for, but long story short it helps with the performance of Direct3D 12 (DX12) games running on Linux by way of Valve’s Steam Play with Proton + VKD3D-Proton.
The main patch for enabling BTP+BTI RCC keying is over five years old, being authored all the way back in November 2020 and finally merged to Mesa 26.1-devel today. The main patch explains of the feature enablement:
“We can drop RT flush and PS Scoreboard stall if state cache perf fix disabled is set to 1. If bit is set RCC uses the sum of Binding Table Pointer and Binding Table Index as tag in state cache instead of just Binding Table Index.
On DX12 this is a performance win on all workloads we’ve tested.
On DX11 there are a bunch of performance of regression. We think this is due to the fact that to avoid trashing the RCC, we need to remove all but render targets from the binding table, meaning all shader resource accesses have to go through the bindless HW heap. This leads to additional register usage due to the need to push the base offset of descriptor sets. Improvement in the compiler would likely mitigate this.
This change introduce a DRIRC key we only turn on for DX12.
Also platforms prior to DG2/LSC have a really small bindless heap that leads to additional register usage, so this optimization is completely disable there.”
The main takeaway from this patch for helping DG2/Alchemist GPUs and newer being: “On DX12 this is a performance win on all workloads we’ve tested.”
Beyond that main patch, ten other more recent patches were part of this merge request with some fixes and other ANV updates around this feature enablement.
With the merged code, the DRIConf option for toggling the feature is “anv_state_cache_perf_fix” with the description of “Whether COMMON_SLICE_CHICKEN3 bit13 should be programmed to enable BTP+BTI RCC keying“
This Intel ANV Vulkan driver code does depend upon a Xe kernel driver patch around per-queue programming of the COMMON_SLICE_CHICKEN3 bit13. It’s looking like that patch will be found in the upcoming Linux 7.1 cycle.
No specific performance numbers were provided as part of the merge request, but should be a clear performance win for Direct3D 12 titles. Will be fun to benchmark soon.
