The GNU C Library’s malloc implementation is now enabling 2MB Transparent Huge Pages (THP) by default for AArch64 Linux. This is being done in the name of better performance — a healthy 6.25% performance improvement is noted for SPEC with this change.
Arm engineer Dev Jain explained with this commit to Glibc enabling the 2MB THP by default on AArch64:
“malloc: Enable 2MB THP by default on Aarch64
Linux supports multi-sized Transparent Huge Pages (mTHP). For the purpose of this patch description, we call the block size mapped by a non-last level pagetable level, the traditional THP size (2M for 4K basepage, 512M for 64K basepage). Linux now also supports intermediate THP sizes mapped by the last level pagetable – we call that the mTHP size.
The support for mTHP in Linux has grown to be better and stable over time – applications can benefit from reduced page faults and reduced kernel memory management overhead, albeit at the cost of internal fragmentation. We have observed consistent performance boosts with mTHP with little variance.
As a result, enable 2M THP by default on Aarch64. This enables THP even if user hasn’t passed glibc.malloc.hugetlb=1. If user has passed it, we avoid making the system call to check the hugepage size from sysfs, and override it with the hardcoded 2MB.
There are two additional benefits of this patch, if the transparent hugepage sysctl is set to madvise or always:
1) The THP size is now hardcoded to 2MB for Aarch64. This avoids a syscall for fetching the THP size from sysfs.
2) On 64K basepage size systems, the traditional THP size is 512M, which is unusable and impractical. We can instead benefit from the mTHP size of 2M. Apart from the usual benefit of THPs/mTHPs as described above, Aarch64 systems benefit from reduced TLB pressure on this mTHP size, commonly known as the “contpte” size. If the application takes a pagefault, and either the THP sysctl settings is “always”, or the virtual memory area has been madvise(MADV_HUGEPAGE)’d along with sysctl being “madvise”, then Linux will fault in a 2M mTHP, mapping contiguous pages into the pagetable, and painting the pagetable entries with the cont-bit. This bit is a hint to the hardware that the concerned pagetable entry maps a page which is part of a set of contiguous pages – the TLB then only remembers a single entry for this set of 2M/64K = 32 pages, because the physical address of any other page in this contiguous set is computable by the TLB cached physical address via a linear offset. Hence, what was only possible with the traditional THP size, is now possible with the mTHP size.
We see a 6.25% performance improvement on SPEC.
If the sysctl is set to never, no transparent hugepages will be created by the kernel. But, this patch still sets thp_pagesize = 2MB. The benefit is that on MORECORE() invocation, we extend the heap by 2MB instead of 4KB, potentially reducing the frequency of this syscall’s invocation by 512x. Note that, there is no difference in cost between an sbrk(2M) and sbrk(4K); the kernel only does a virtual reservation and does not touch user physical memory.”
This 2MB THP by default on AArch64 will be found with the GNU C Library’s Glibc 2.43 release expected out in February.
