Intel Linux engineers have been working on enhancing the NVMe storage performance with today’s high core count processors. Due to situations where multiple CPUs could end up sharing the same NVMe IRQ(s), performance penalties can arise if the IRQ affinity and the CPU’s cluster do not align. There is a pending patch to address this situation. A 15% performance improvement was reported with the pending patch.
The code working its way to the Linux kernel is making the lib/group_cpus.c code CPU cluster-aware. Intel engineer Wangyang Guo explained of the situation with the patch:
“As CPU core counts increase, the number of NVMe IRQs may be smaller than the total number of CPUs. This forces multiple CPUs to share the same IRQ. If the IRQ affinity and the CPU’s cluster do not align, a performance penalty can be observed on some platforms.
This patch improves IRQ affinity by grouping CPUs by cluster within each NUMA domain, ensuring better locality between CPUs and their assigned NVMe IRQs.”
On an Intel Xeon E server, this patch improved the well-used FIO benchmark’s libaio random read performance by around 15%. No other performance numbers for other I/O workloads or hardware configurations were provided as part of the patch in Git or on the mailing list besides the lone Xeon E run. It will be interesting to see the impact more broadly on multi-cluster CPUs once this patch is merged to the mainline Linux kernel.
The patch has worked its way into Andrew Morton’s “mm-everything” Git branch as part of the MM code he oversees for the kernel. We’ll see if this 271 line CPU cluster-aware patch manages to make it into next month’s Linux 6.20~7.0 kernel merge window for helping with NVMe performance in these situations.
