A Huawei engineer has sent out patches proposing HQspinlock as a Hierarchical Queued NUMA-aware spinlock for the Linux kernel. HQspinlock aims to addresss inefficiencies within the Linux kernel’s spinlock on modern NUMA-systems due to frequent and costly cross-NUMA cache-line transfers.
The HQspinlock is described in Thursday’s patch proposal as:
“In a high contention case, existing Linux kernel spinlock implementations can become inefficient on modern NUMA-systems due to frequent and expensive cross-NUMA cache-line transfers.
This might happen due to following reasons:
– on “contender enqueue” each lock contender updates a shared lock structure
– on “MCS handoff” cross-NUMA cache-line transfer occurs when two contenders are from different NUMA nodes.We introduce Hierarchical Queued Spinlock (HQ spinlock), aiming to reduce cross-NUMA cache line traffic and thus improving lock/unlock throughput for high-contention cases.
This idea might be considered as a combination of cohort-locking by Dave Dice and Linux kernel queued spinlock.”
With the proposed patches, adapting to the HQspinlock can be as easy as switching initialization calls from spin_lock_init() to spin_lock_init_hq(). But for those not caring too much about the kernel internals and wanting to know the benefits, there are some very nice throughput performance gains and lower latency observed with the likes of Memcached and the Nginx HTTPS web server.
Most of the performance benchmarks shared were done on an AMD EPYC 9654 server along with some from a Kunpeng 920 ARM64 server. The results of HQspinlock are very promising:
Those interested in learning more can do so via this patch series.
