Just over one year ago Intel Linux engineers began working on cache-aware load balancing for Linux or more commonly referred to as Cache Aware Scheduling. The functionality for helping modern Intel Xeon and AMD EPYC processors especially hasn’t yet been upstreamed to the Linux kernel but yesterday the fourth version of these patches were posted for review.
Cache Aware Scheduling aims to enhance the Linux performance for modern CPUs with multiple cache domains. The scheduler tries to help ensure that tasks sharing data are colocated to the same last level cache (LLC) domain for ensuring better cache locality and reducing cache misses/bouncing.
The new “v4” patches of Cache Aware Scheduling introduce new code to limit the CPU scanning depth with the preferred NUMA node for placement when NUMA balancing is enabled. There are also changes around the the load imbalance at low-load, improving the LLC ID management, and other changes. Fundamentally though the cache aware scheduling load balancing logic is the same with the new v4 patch series.
Intel’s own performance benchmarks shown with the patch cover letter indicate nice performance gains for Intel Xeon and AMD EPYC. My own testing of prior versions of these patches have also been very positive: Linux’s Proposed Cache Aware Scheduling Benchmarks Show Big Potential On AMD EPYC Turin and Cache Aware Scheduling Raises Performance For Intel Xeon 6 Granite Rapids.
Those interested can find the new Cache Aware Scheduling v4 patches on the kernel mailing list. Hopefully this Cache Aware Scheduling functionality will manage to make it into the mainline Linux kernel this year.
