The core timer changes to the Linux 7.0 kernel score a rather nice performance improvement in a UDP receive network stress test from inlining a function that compilers haven’t been able to tackle with their optimizations.
The timer changes that have been merged for Linux 7.0 include manually inlining the timecounter_cyc2time() code used in a networking hot code path. This ends up delivering a 12% improvement on a UDP receive stress test on a 100 Gb NIC interface. Inlining the two functions manually is being done since compiler feedback directed optimizations (FDO), link time optimizations (LTO), or profile guided optimizations (PGO) haven’t been able to address since network drivers are typically shipped as kernel modules rather than built-ins.
Eric Dumazet of Google explained in the patch delivering this optimization:
“New network transport protocols want NIC drivers to get hardware timestamps of all incoming packets, and possibly all outgoing packets.
One example is the upcoming ‘Swift congestion control’ which is used by TCP transport and is the primary need for timecounter_cyc2time(). This means timecounter_cyc2time() can be called more than 100 million times per second on a busy server.
Inlining timecounter_cyc2time() brings a 12% improvement on a UDP receive stress test on a 100Gbit NIC.
Note that FDO, LTO, PGO are unable to magically help for this case, presumably because NIC drivers are almost exclusively shipped as modules.”
The code is merged as part of the timers/core change along with a separate optimization. That other optimization is for the tick dependency check when the tracepoint is disabled, which helps a hot path in the tick management code when transitioning in/out of idle.
Great seeing all the improvements flowing in for the coincidentally timed Linux 7.0 kernel.
