A patch is on the way to the Linux kernel and looks like it could be ready for the 6.20~7.0 kernel for addressing out-of-memory “OOM” killer inaccuracy behavior when dealing with large core count systems.
A patch by Linux developer Mathieu Desnoyers made it into Andrew Morton’s “mm-everything” queue this week to fix out-of-memory killer inaccuracy on large many-core systems.
In early 2025 it was reported that there were inaccuracies in the OOM killer when dealing with today’s high core count systems, at least in the 250+ core/thread count range:
“Recently, several internal services had an RSS usage regression as part of a kernel upgrade. Previously, they were on a pre-6.2 kernel and were able to read RSS statistics in a backup watchdog process to monitor and decide if they’d overrun their memory budget. Now, however, a representative service with five threads, expected to use about a hundred MB of memory, on a 250-cpu machine had memory usage tens of megabytes different from the expected amount — this constituted a significant percentage of inaccuracy, causing the watchdog to act.
…
This is a really tremendous inaccuracy for any few-threaded program on a large machine and impedes monitoring significantly. These stat counters are also used to make OOM killing decisions, so this additional inaccuracy could make a big difference in OOM situations — either resulting in the wrong process being killed, or in less memory being returned from an OOM-kill than expected.Finally, while the change to percpu_counter does significantly improve the accuracy over the previous per-thread error for many-threaded services, it does also have performance implications – up to 12% slower for short-lived processes and 9% increased system time in make test workloads.”
This patch working its way to the mainline kernel hopefully for the upcoming Linux 6.20~7.0 cycle should address those inaccuracies.
