Andrew Morton this week sent in some additional memory management “MM” changes for the Linux 6.17 to complement last week’s many MM patches from new optimizations to more DAMON features. Most notable with this secondary set of patches are khugepaged optimizations that especially help ARM64 Linux systems.
Khugepaged is part of the Transparent Hugepage support in the Linux kernel and is seeing some exciting optimization work in Linux 6.17 for AArch64 hardware. The optimizations improve khugepaged throughput via batching PTE operations for large folios.
Dev Jain of Arm explained on the patch series for the work:
“If the underlying folio mapped by the ptes is large, we can process those ptes in a batch using folio_pte_batch().
For arm64 specifically, this results in a 16x reduction in the number of ptep_get() calls, since on a contig block, ptep_get() on arm64 will iterate through all 16 entries to collect a/d bits. Next, ptep_clear() will cause a TLBI for every contig block in the range via contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at the first and last contig block of the range.
For split folios, there will be no pte batching; the batch size returned by folio_pte_batch() will be 1. For pagetable split folios, the ptes will still point to the same large folio; for arm64, this results in the optimization described above, and for other arches, a minor improvement is expected due to a reduction in the number of function calls and batching atomic operations.”
The ptep_get() call seeing a 16x reduction is a helper function for safely accessing page table entries (PTEs). ARM64 systems are also seeing a reduction in the number of TLB flushes happening as a result of these khugepaged optimizations.
The additional MM pull request for Linux 6.17 additionally includes enabling EXECMEM_ROX_CACHE support for ftrace and kprobes. The merged code also brings performance improvements for the mTHP swap-in code.