Merged on Wednesday were some additional memory management “MM” updates for the Linux 7.0 merge window. Most interesting out of these latest three dozen patches is support for batched unmapping of file-backed large folios.
The patches to support batch checking of references and unmapping for large folios is showing very nice performance numbers for reclaiming file-backed large folios. This work was carried out by Alibaba engineer Baolin Wang. He explained back on the patch series:
“Currently, folio_referenced_one() always checks the young flag for each PTE sequentially, which is inefficient for large folios. This inefficiency is especially noticeable when reclaiming clean file-backed large folios, where folio_referenced() is observed as a significant performance hotspot.
Moreover, on Arm architecture, which supports contiguous PTEs, there is already an optimization to clear the young flags for PTEs within a contiguous range. However, this is not sufficient. We can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).”
When the patch series concludes with the batched unmapping for file large folios is where the numbers come out and are quite enticing:
“Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface. I can observe 75% performance improvement on my Arm64 32-core server (and 50%+ improvement on my X86 machine) with this patch.”
Some nice gains and with the increasing use of folios throughout the Linux kernel.
See this MM pull request for those interested in these latest patches now merged for Linux 7.0.
