Due to Intel CPU microcode sizes continuing to get larger and late-loading new CPU microcode onto a running system can lead to (brief) disruptions/downtime while the update is applied, future Intel CPUs are introducing a microcode “staging” feature to reduce that microcode updating downtime. The Linux 6.19 kernel in the new year is set to support the Intel microcode staging feature with capable processors.
Applying CPU microcode updates while systems/servers are online can lead to brief disruptions. It’s long been known and for years already optimizations to reduce the microcode downtime impact have been carried out. But with CPU microcode binary blobs continuing to grow in size, Intel has been preparing a microcode staging feature to reduce the impact.
The Intel microcode staging feature allows for processing most of the microcode update to occur in a non-critical CPU path so that the CPU cores can remain operational for a majority of the update. In turn this reduces the latency spikes as the CPU cores do not need to be stopped for the entire microcode update process. Only during the brief final activation do the CPU cores need to be stopped with this staging method. But if the update fails, the “legacy” microcode update process is pursued.
Intel Linux engineers have been working on plumbing the Linux kernel integration for this microcode staging support for one year already while it all looks buttoned up in time for the Linux 6.19 merge window in December.
This week the patches were queued into the tip/tip.git x86/microcode branch. With the patches now in a TIP branch, they should be part of the upcoming Linux 6.19 kernel. The patches do not indicate which Intel CPU generation will initially support this microcode staging feature but is simply checking a new bit for the architectural capability for handling staged microcode updates.
“As microcode patch sizes continue to grow, late-loading latency spikes can lead to timeouts and disruptions in running workloads. This trend of increasing patch sizes is expected to continue, so a foundational solution is needed to address the issue.
To mitigate the problem, introduce a microcode staging feature. This option processes most of the microcode update (excluding activation) on a non-critical path, allowing CPUs to remain operational during the majority of the update. By offloading work from the critical path, staging can significantly reduce latency spikes.”
Is how this feature is officially described on this patch.