While we are approaching the end of the Linux 6.14 merge window with Linux 6.14-rc1 expected on Sunday, the fun isn’t over quite yet… Among other last minute pull requests today were a set of patches to work on better optimizing the TLB flushing scalability for modern Intel and AMD x86_64 processors.
Ingo Molnar sent out the x86/mm pull request today and commented there:
“The biggest changes are the TLB flushing scalability optimizations, to update the mm_cpumask lazily and related changes. This feature has both a track record and a continued risk of performance regressions, so it was already delayed by a cycle – but it’s all 100% perfect now™.”
Rik van Riel with Meta has been working on these TLB flushing scalability optimizations the past few months and stand to benefit multi-threaded workloads with lots of context switching. In one simple kernel scheduler benchmark test case with Hackbench, the run-time dropped from 4.5 to 4.2 seconds with these few patches.
Rik noted on one of the patches for these optimizations to help avoid cache line thrashing:
“On busy multi-threaded workloads, there can be significant contention on the mm_cpumask at context switch time.
Reduce that contention by updating mm_cpumask lazily, setting the CPU bit at context switch time (if not already set), and clearing the CPU bit at the first TLB flush sent to a CPU where the process isn’t running.
When a flurry of TLB flushes for a process happen, only the first one will be sent to CPUs where the process isn’t running. The others will be sent to CPUs where the process is currently running.
On an AMD Milan system with 36 cores, there is a noticeable difference:
$ hackbench –groups 20 –loops 10000Before: ~4.5s +/- 0.1s
After: ~4.2s +/- 0.1s”
And on the other focus of these optimizations patches:
“On a web server workload, the cpumask_test_cpu inside the WARN_ON_ONCE in the prev == next branch takes about 17% of all the CPU time of switch_mm_irqs_off.
On a large fleet, this WARN_ON_ONCE has not fired in at least a month, possibly never.
Move this test under CONFIG_DEBUG_VM so it does not get compiled in production kernels.”
It will be interesting to see what other workloads end up benefiting in a measurable way and hopefully no regressions… With Linux 6.14-rc1 due out this weekend, it will turn to the testing and benchmarking period of the Linux 6.14 kernel cycle.