The “timers/core” pull requests for updating Linux kernel timer-related code doesn’t tend to be too interesting each kernel cycle, but this time around for Linux 6.19 it is for addressing a problem HPE discovered on big NUMA servers.
Linux 6.19 fixes a timekeeper CPU issue that could lead to a large number of CPU cores getting stuck on very large NUMA servers. The pull request noted:
“Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate “uncontended” atomic variable.”
Steve Wahl of HPE authored the patch to fix this issue they spotted at the company. The HPE engineer further explained with the patch:
“On large NUMA systems, while running a test program that saturates the inter-processor and inter-NUMA links, acquiring the jiffies_lock can be very expensive. If the cpu designated to do jiffies updates (tick_do_timer_cpu) gets delayed and other cpus decide to do the jiffies update themselves, a large number of them decide to do so at the same time. The inexpensive check against tick_next_period is far quicker than actually acquiring the lock, so most of these get in line to obtain the lock. If obtaining the lock is slow enough, this spirals into the vast majority of CPUs continuously being stuck waiting for this lock, just to obtain it and find out that time has already been updated by another cpu. For example, on one random entry to kdb by manually-injected NMI, I saw 2912 of 3840 cpus stuck here.
To avoid this, allow only one non-timekeeper CPU to call tick_do_update_jiffies64() at any given time, resetting ts->stalled jiffies only if the jiffies update function is actually called.
With this change, manually interrupting the test I find at most two CPUs in the tick_do_update_jiffies64 function (the timekeeper and one other).”
This fix was merged this week for Linux 6.19.
