A change to the Linux kernel’s extensible scheduler class “sched_ext” for allowing nifty scheduler implementations via BPF programs will begin to prioritize SMT siblings to help with better performance.
A sched_ext change queued in its development tree ahead of the upcoming Linux 7.1 kernel cycle will prioritize idle SMT siblings for providing slightly better performance over the current behavior of just picking a CPU within the same last level cache. If there is an idle SMT sibling, sched_ext will now prefer it before checking for CPUs within the same LLC followed by the same NUMA code or any other idle CPU on the system.
Andrea Righi of NVIDIA clocked the benefit of prioritizing idle SMT siblings at 2~3% for CPU-bound workloads. He explained in the queued patch making the change:
“In the default built-in idle CPU selection policy, when @prev_cpu is busy and no fully idle core is available, try to place the task on its SMT sibling if that sibling is idle, before searching any other idle CPU in the same LLC.
Migration to the sibling is cheap and keeps the task on the same core, preserving L1 cache and reducing wakeup latency.
On large SMT systems this appears to consistently boost throughput by roughly 2-3% on CPU-bound workloads (running a number of tasks equal to the number of SMT cores).”
With the patch in sched_ext.git’s “for-next” Git branch the change should land for next month’s Linux 7.1 merge window.
