All of the memory management “MM” related patches have now been merged for the ongoing Linux 7.0 merge window.
Andrew Morton sent out all of the “MM” patches earlier today and they have since been merged by Linus Torvalds for Linux 7.0. Below is a look at some of the patches catching my eye this cycle, which as usual mostly revolve around performance improvements to the Linux kernel.
There is a patch series introducing compressed data writeback support for Zram. This compressed data writeback handling should help with CPU power savings and helping enhance power efficiency especially on laptops. The sub-optimal uncompressed writeback is now using a more optimal compressed data writeback.
Another interesting MM patch series merged this round is clearing of contiguous page ranges for hugepages. This can provide a large improvement for demand faulting both for 2MB pages as well as larger page sizes. This clearing of contiguous page ranges for hugepages is showing very nice results. Oracle engineer Ankur Arora elaborates in the patch series:
“The series improves on the current discontiguous clearing approach in two ways:
– clear pages in a contiguous fashion.
– use batched clearing via clear_pages() wherever exposed.The first is useful because it allows us to make much better use of hardware prefetchers.
The second, enables advertising the real extent to the processor. Where specific instructions support it (ex. string instructions on x86; “mops” on arm64 etc), a processor can optimize based on this because, instead of seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a larger unit being operated on.
For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to a mode where they start eliding cacheline allocation. This is helpful not just because it results in higher bandwidth, but also because now the cache is not evicting useful cachelines and replacing them with zeroes.”
The benchmark results speak for itself:
Separately, another optimization part of the MM pull for Linux 7.0 is accelerating gigantic folio allocation. This greatly speeds up gigantic folio allocation by avoiding unnecessary work. As a reference point, the allocation time for 120×1G folios went from from 3.605s down to just 0.431s.
Yet more interesting optimization work was unifying swapin use and removing some old swap code that wasn’t performing well. These clean-ups and improvements yielded a 20% speed-up in a Redis benchmark:
The last patch series catching my eye of the MM code this merge window was enabling PT_RECLAIM for more 64-bit architectures, including Alpha, LoongArch, MIPS, Parisc, and UM.
Lots of MM activity as always. The full list of merged MM patches for the Linux 7.0 cycle can be found via the pull request.
