The past number of months has seen a lot of work by Intel Linux kernel engineers on cache-aware scheduling / load balancing for helping modern CPUs that have multiple caches. With cache aware scheduling, tasks that will likely share resources could be aggregated into the same cache domain to enjoy better cache locality. With the cache aware scheduling patches recently updated and now working past the “request for comments” stage, I was eager to try out these new patches. Especially with a 44% time reduction reported for one of the benchmarks, I was eager to run some tests and the first of those results are being shared today.
With the latest Linux cache aware scheduling patches posted earlier in October, there were nice improvements reported on the likes of Intel Xeon Sapphire Rapids, AMD EPYC Milan, and AMD EPYC Genoa. So I’ve begun testing these patches atop the Linux 6.17 kernel on some hardware locally. First up was using the flagship AMD EPYC 9005 “Turin” processors with the 192-core EPYC 9965.
With the AMD EPYC 9965 processors sporting 192 cores / 384 threads, 384MB L3 cache, I was curious to test this dual EPYC 9965 server with the Cache Aware Scheduling enabled kernel to see the impact especially given the patch letter mentioning the great results on the older Milan and Genoa EPYC platforms. So that’s the first target where I have begun testing Cache Aware Scheduling and have some pretty tantalizing results to share today.
The data today is from a stock Linux 6.17 kernel build and then repeating the same kernel build with the 11 October patches applied for Cache Aware Scheduling and built with them enabled for comparing the performance impact. The same AMD Volcano server with the dual EPYC 9965 processors were used for this comparison with just changing out the kernel build between test runs.
Cache Aware Scheduling does sport a number of tunables via sysfs for additional configuration parameters. But for this initial round of testing, the Cache Aware Scheduling tunables and other kernel settings were simply at their defaults.
It’s important to note Cache Aware Scheduling won’t help all workloads. In fact, one of the patches is to disable cache aware scheduling for processes with high thread counts. If the number of active threads within the process exceeds the number of physical cores in the LLC, CAS isn’t enabled due to risk of cache contention within the preferred LLC. But for workloads not fully leveraging the CPU such as with typical heterogeneous Linux server workloads, cache aware scheduling can have some interesting potential.