In the works the past number of months has been cache-aware load balancing / cache aware scheduling support for Linux. The latest iteration of those patches by Intel were posted this weekend and are enjoying the most uplift on AMD EPYC Genoa and newer platforms.
There have been four iterations of cache aware scheduling as a “request for comments” while the new round of patches drops that RFC designation. The Intel developers involve hope the code is now ready for consideration of inclusion to the mainline kernel. The focus remains on being able to aggregate tasks sharing data to the same LLC cache domain to reduce cache misses and cache bouncing.
Over the prior RFC v4 patches, there have just been some minor alterations and fixes to the code. New to the patch series are test results included for AMD EPYC 9004 “Genoa” with some staggering results: up to 44% faster than the current mainline kernel! With the ChaCha20-xiangshan benchmark, the time on that AMD EPYC Genoa test system drops from 50,868 ms to just 28,349 ms with cache aware scheduling.
On older AMD EPYC Milan they didn’t find any performance benefit. Meanwhile on Intel’s own Sapphire Rapids server used for testing, they found Hackbench showing some benefit in select cases. Or for the ChaCha20-xiangshan up to a ~10% improvement.
Kevork won’t be happy though with this Intel-led patch series given his recent comments around Intel open-source helping competitors given the greatest gains by far being for AMD EPYC Genoa. Granted, Sapphire Rapids is aging at this point and would be interesting to see the Cache Aware Scheduling benefit for the likes of Xeon 6+ Clearwater Forest.
Those wanting to check out these new Cache Aware Scheduling patches can find them on the Linux kernel mailing list. Here’s to hoping these patches could be ready for the mainline kernel soon, potentially as soon as the v6.19 cycle for early 2026.