Queued into tip/tip.git’s “sched/urgent” Git branch today is a patch to disable the kernel scheduler’s NEXT_BUDDY functionality that was re-implemented back during the Linux 6.19 merge window. It turns out to cause some performance regressions that have yet to be otherwise addressed.
Linux 6.19 back during the merge window re-introduced the NEXT_BUDDY feature after being adapted for EEVDF. But it turns out there are some performance regressions from this code and thus the patch in sched/urgent to disable it… With it hitting this urgent TIP branch, it will presumably be mainlined in the coming days as part of the scheduler fixes. The one-line patch disabling NEXT_BUDDY explains:
“NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again after NEXT_BUDDY was rewritten for EEVDF by commit e837456fdca8 (“sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals”). It was not expected that this would be a universal win without a crystal ball instruction but the reported regressions are a concern even if gains were also reported. Specifically;
o mysql with client/server running on different servers regresses
o specjbb reports lower peak metrics
o daytrader regressesThe mysql is realistic and a concern. It needs to be confirmed if specjbb is simply shifting the point where peak performance is measured but still a concern. daytrader is considered to be representative of a real workload.
Access to test machines is currently problematic for verifying any fix to this problem. Disable NEXT_BUDDY for now by default until the root causes are addressed.”
Aside from the MySQL, SPECjbb, and DayTrader regressions noted, back in early December in my early Linux 6.19 benchmarking I also found and warned of scheduler regressions – Scheduler Woes: Bisecting Early Performance Regressions Found In Linux 6.19. When bisecting a measurable Nginx HTTPS web server regression as part of that, the e837456fdca8 NEXT_BUDDY commit ended up being one of the possible culprits for that regression at least.
I hadn’t bisected all of the regressions so once this patch is merged for Linux 6.19 it will be interesting to re-visit to confirm if it addresses the issue and what other workloads may have been impacted by this particular commit. At the same time, as noted in the patch message, NEXT_BUDDY had helped with the performance of some workloads so it’s possible once this patch is merged that there could be some drops in performance compared to earlier in the v6.19 cycle.
