Here is open-source at its finest with a NVIDIA Linux kernel engineer ultimately making a fix to a performance regression that came up for AMD integrated and discrete graphics when running on the early Linux 6.15 kernel code.
Sent out this morning for pulling into the Linux 6.15 Git tree was this x86 fixes pull request. Headlining this pull request was:
“Fix a performance regression on AMD iGPU and dGPU drivers, related to the unintended activation of DMA bounce buffers that regressed game performance if KASLR disturbed things just enough.”
When digging into that change, it was all the more interesting that a NVIDIA engineer made the regression fix for this AMD iGPU/dGPU driver performance regression. But it’s not without cause or just doing it out of the kindness of his heart… As it turns out ultimately this regression was inadvertently introduced by the engineer. Being a good open-source developer and rather than just punting it off or waiting for someone else to fix it, he fixed his own code for the regression introduced even when it just regressed a competitor’s driver.
Merged last week as part of all the Intel / AMD x86_64 updates for Linux 6.15 was a change to the Kernel Address Space Layout Randomization (KASLR). The KASLR entropy was being reduced for most x86/x86_64 systems to support PCI BAR space beyond the 10 TiB region. NVIDIA engineer Balbir Singh made that change and ultimately is what ended up regressing the AMD drivers.
The regression is explained in the commit containing the regression fix:
“As Bert Karwatzki reported, the following recent commit causes a performance regression on AMD iGPU and dGPU systems:
7ffb791423c7 (“x86/kaslr: Reduce KASLR entropy on most x86 systems”)
It exposed a bug with nokaslr and zone device interaction.
The root cause of the bug is that, the GPU driver registers a zone device private memory region. When KASLR is disabled or the above commit is applied, the direct_map_physmem_end is set to much higher than 10 TiB typically to the 64TiB address. When zone device private memory is added to the system via add_pages(), it bumps up the max_pfn to the same value. This causes dma_addressing_limited() to return true, since the device cannot address memory all the way up to max_pfn.
This caused a regression for games played on the iGPU, as it resulted in the DMA32 zone being used for GPU allocations.
Fix this by not bumping up max_pfn on x86 systems, when pgmap is passed into add_pages(). The presence of pgmap is used to determine if device private memory is being added via add_pages().”
Bert Karwatzki reported this regression nearly one month ago when the code was still within the linux-next testing area. He bisected the issue and reported it breaking for at least one Steam game:
“Using linux next-20250307 to play the game stellaris via steam I noticed that loading the game gets sluggish with the progress bar getting stuck at 100%. In this situation mouse and keyboard inputs don’t work properly anymore. Switching to a VT and killing stellaris somewhat fixes the situation though in one instance the touchpad did not work after that. I bisected this between v6.14-rc5 and next-20250307 and got this as the first bad commit.”
This AMD graphics driver performance regression will now be fixed in the latest Linux 6.15 Git code as soon as the pull lands, which should be later today. Kudos to all those involved for getting this fixed so soon and prior to the Linux 6.15-rc1 release coming out on Sunday.
Separately, the x86 fixes pull request also has another performance regression fix. For non-FSRM/EMRS CPUs the pending code is now aligning writes to 8-bytes within copy_user_generic(). This is for a two year old performance regression for CPUs without FSRM/EMRS.