An unfortunate Linux kernel bug coming to light just ahead of Christmas may cause frustration for some server administrators, particularly public cloud providers… It turns out with the Linux kernel releases since 2022, KVM guest virtual machines making use of Intel Advanced Matrix Extensions (AMX) is possible to cause the host to experience a kernel panic.
Linux KVM maintainer Paolo Bonzini of Red Hat posted a set of patches this evening for addressing this scenario of the host experiencing a possible panic as a result of AMX usage from within KVM guests. Presumably treated as a denial of service but as of writing haven’t seen any CVE report made public.
Advanced Matrix Extensions is one of Intel’s advantages with recent generations of Xeon Scalable processors. AMX can be quite beneficial for AI workloads with software ready to AMX. Unfortunately, an issue has been uncovered with KVM where AMX usage inside of a VM could lead to the host experiencing a panic from an unexpected #NM exception.
Bonzini explained with tonight’s patches on the mailing list:
“Fix a possible host panic, due to an unexpected #NM, when a KVM guest is using AMX features.
The guest’s XFD value, which is stored in fpstate->xfd, is used for both guest execution and host XSAVE operations. However, the guest-configured XFD setting can disable features that the host needs enabled to successfully XRSTOR the guest FPU state.”
Unfortunately, this affects all Linux kernel releases going back to January 2022 back with Linux 5.17. Thus pretty much all Intel AMX users in production. The patches are out for review on the mailing list but will presumably be sent in via a fixes pull ready to Linux Git in the coming days — assuming no holiday delays — and then back-ported to the various Linux kernel stable series.
