After all of these years of Linux dominating the high performance computing (HPC) space and other industries, one might think (most) all the interesting performance nuggets have been uncovered and well thought out and robust fallbacks in place across all important code paths. As we showcase almost each cycle, interesting new performance bits to be uncovered within the Linux kernel. For Linux 6.17 thanks to a NVIDIA engineer is applying a better fallback for NUMA locality rather than simply picking a random CPU core.
Merged today for the Linux 6.17 merge window was the smp/core pull request. Of the few patches in the SMP core code this cycle, Yury Norov of NVIDIA had the most patches showing his recent focus on enhancing the Linux SMP code. In particular, one of the patches is for improving the locality of the smp_call_function_any() call to find a better secondary match rather than picking effectively a random CPU core for execution.
Yury explained with the patch:
“smp_call_function_any() tries to make a local call as it’s the cheapest option, or switches to a CPU in the same node. If it’s not possible, the algorithm gives up and searches for any CPU, in a numerical order.
Instead, it can search for the best CPU based on NUMA locality, including the 2nd nearest hop (a set of equidistant nodes), and higher.
sched_numa_find_nth_cpu() does exactly that, and also helps to drop most of the housekeeping code.”
So with a few lines of code, a better fallback rather than simply dropping to pick a random/first CPU. The smp_call_function_any() call in the Linux kernel is used to run a function on any CPU (ideally) of the given mask. Now at least it will do so with better logic.
There was no background shared as part of the patches for NVIDIA’s emphasis on improving this code, such as if there was a particular production issue/bottleneck observed or other factors.