SK has been working on a Linux kernel feature dubbed Lazy Unmap Flush “LUF” to defer TLB flushes until folios have been unmapped and freed are eventually allocated again.
This Lazy Unmap Flush work began after encountering a lot of migration overhead around TLB shootdowns on servers with tiered memory making use of CXL memory.
The end result is what is most interesting and important: the LUF patches yielded TLB shootdown interrupts being reduced by around 97%. Furthermore, the test program runtime of using Llama.cpp with a large language model (LLM) yielded around 4.5% lower runtime.
The most recent Lazy Unmap Flush patches were stressed for a week by running an AI LLM inference workload with 140GB of memory to prove its stability. The huge reduction in TLB shootdown interrupts and several percent gain to the Llama.cpp AI runtime as an example are quite promising.
Those interested in these “request for comments” patches can find the latest patches on the Linux kernel mailing list.