As a follow-up to the Phoronix article a few days ago entitled A Linux 6.15 Performance Regression Hits Modern AMD CPUs, there continues to be activity to address this issue with the performance impact catching the upstream kernel developers off guard. I’ve tested a patch now that does address the issue while still carrying the KVM protections desired.
As outlined in that earlier article after noting the Linux 6.15 regression and bisecting the issue, the change comes down to how an AMD SRSO security mitigation is applied for KVM. With Linux 6.15 the mitigation is applied simply if the KVM module is loaded on the system but not even running any virtual machines (VMs) where it’s needed for the guest/host boundary protections. So there ends up being a performance hit that is more significant than the developers anticipated when it really isn’t even needed without any virtual machines active.
AMD and Google engineers have been working on the SRSO mitigation handling and in coming up with a solution to address this performance hit. Google’s Sean Christopherson commented yesterday:
“Eww. That’s quite painful, and completely disallowing enable_virt_on_load is undesirable, e.g. for use cases where the host is (almost) exclusively running VMs.
Best idea I have is to throw in the towel on getting fancy, and just maintain a dedicated count in SVM.
Alternatively, we could plumb an arch hook into kvm_create_vm() and kvm_destroy_vm() that’s called when KVM adds/deletes a VM from vm_list, and key off vm_list being empty. But that adds a lot of boilerplate just to avoid a mutex+count.
…
Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and only if KVM has at least one active VM. Leaving the bit set at all times unfortunately degrades performance by a wee bit more than expected.Use a dedicated mutex and counter instead of hooking virtualization enablement, as changing the behavior of kvm.enable_virt_at_load based on SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could result in performance issues for flows that are sensity to VM creation latency.”
I’ve tested the patch on an affected AMD system and indeed the bare metal AMD Linux 6.15 performance when no VMs are running is back to where it was on Linux 6.14 and prior… So hopefully this patch or some incarnation of it ends up being merged to Linux 6.15 upstream in the days ahead for addressing this performance issue.