Huawei engineer Chen Jinghuang posted the latest request for comments (RFC) patches for stealing tasks from overloaded CPUs in the same last level cache (LLC) in order to improve overall CPU utilization with today’s large core count servers.
The intent with the proposed patches is when a given CPU core has no more tasks to run, attempt to steal a task from an overloaded CPU on the same system that is within the same CPU last level cache domain. Chen Jinghuang explained the technical overview in the patch series:
“When a CPU has no more CFS tasks to run, and idle_balance() fails to find a task, then attempt to steal a task from an overloaded CPU in the same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently identify candidates. To minimize search time, steal the first migratable task that is found when the bitmap is traversed. For fairness, search for migratable tasks on an overloaded CPU in order of next to run.
This simple stealing yields a higher CPU utilization than idle_balance() alone, because the search is cheap, so it may be called every time the CPU is about to go idle. idle_balance() does more work because it searches widely for the busiest queue, so to limit its CPU consumption, it declines to search if the system is too busy. Simple stealing does not offload the globally busiest queue, but it is much better than running nothing at all.”
The RFC patches have been shown to help improve utilization when enabled via the new SCHED_STEAL feature. On a dual Intel Xeon Platinum server, elapsed time in Hackbench improved by 17%.
Where the patches go from here isn’t clear with upstream kernel developer Peter Zijlstra commenting on the mailing list that the better approach may be to fix the current behavior instead of a new way for getting tasks on a CPU.
