Joshua Hahn has posted the latest “request for comments” draft working on weightedd interleave auto-tuning for the linux kernel in order to better enhance the performance characteristics of primarily Linux servers with multiple memory nodes.
Weighted interleave as a new memory policy was originally introduced to help with heterogeneous memory environments like with CXL rather than relying on an even round-robin distribution. The new policy allows interleaving memory across nodes based upon the specified weight to better cope with heterogeneous hardware environments. Those weights need to be manually adjusted by the Linux server administrator or other policy but the RFC patches bring auto-tuning support.
Hahn explains in the RFC patch for weighted interleave auto-tuning:
“On machines with multiple memory nodes, interleaving page allocations across nodes allows for better utilization of each node’s bandwidth. Previous work by Gregory Price introduced weighted interleave, which allowed for pages to be allocated across NUMA nodes according to user-set ratios.
Ideally, these weights should be proportional to their bandwidth, so that under bandwidth pressure, each node uses its maximal efficient bandwidth and prevents latency from increasing exponentially.
At the same time, we want these weights to be as small as possible. Having ratios that involve large co-prime numbers like 7639:1345:7 leads to awkward and inefficient allocations, since the node with weight 7 will remain mostly unused (and despite being proportional to bandwidth, will not aid in relieving the pressure present in the other two nodes).
This patch introduces an auto-configuration for the interleave weights that aims to balance the two goals of setting node weights to be proportional to their bandwidths and keeping the weight values low. This balance is controlled by a value “weightiness”, which defines the interleaving aggression. Higher values lead to less interleaving (255:1), while lower values lead to more interleaving (1:1).
Large weightiness values generally lead to increased weight-bandwidth proportionality, but can lead to underutilized nodes (think worst-case scenario, which is 1:max_node_weight). Lower weightiness reduces the effects of underutilized nodes, but may lead to improperly loaded distributions.
This knob is exposed as a sysfs interface with a default value of 32. Weights are re-calculated once at boottime and then every time the knob is changed by the user, or when the ACPI table is updated.”
With the updated patches, the “weightiness” can now be configured via the /sys/kernel/mm/mempolicy/weighted_interleave/weightiness sysfs interface for controlling the interleave aggression.
This patch work is currently under review but will likely make it to the mainline kernel in some form given the increasing number of Linux servers with CXL memory on the horizon.