While a lot of interesting new features and changes have been merged already for the Linux 7.1 merge window, two pull requests stand out so far for being rejected by Linus Torvalds and complete with his to-the-point commentary.
One of the rejected pulls that initially interested me was for addressing a performance regression on today’s high core count workstation/server processors when engaging the audit subsystem. The audit framework can run into a performance bottleneck with higher core count CPUs that a pull request for Linux 7.1 sought to address.
Christian Brauner described the patches in the pull request by he and Red Hat’s Waiman Long as:
“Avoid excessive dput/dget in audit_context setup and reset paths When the audit subsystem is enabled, it can do a lot of get_fs_pwd() calls to get references to fs->pwd and then releasing those references back with path_put() later. That may cause a lot of spinlock contention on a single pwd’s dentry lock because of the constant changes to the reference count when there are many processes on the same working directory actively doing open/close system calls. This can cause noticeable performance regresssion when compared with the case where the audit subsystem is turned off especially on systems with a lot of CPUs which is becoming more common these days.
Avoid this type of performance regression caused by audit by adding a new set of fs_struct helpers to reduce unncessary path_get() and path_put() calls and the audit code is modified to use these new helpers.”
While important given today’s increasingly high CPU core counts and next-gen CPUs going even higher, Linus Torvalds is not content with the design. Torvalds responded:
“I really don’t like this one, and I’m going to at least delay pulling it.
Because this really feels wrong to me. It feels like a hack. Just looking at some of the paths that get modified, we have that copy_fs_struct, which does
path_get(&fs->root);
to increment the root, and then the very next line ends up being
get_fs_pwd_pool_locked(old, &fs->pwd);
that copies the pwd very differently.
And the only reason ‘pwd’ is different is because the audit code treats pwd and root differently.
And that, in turn, is because the audit code is just historical and broken, and can’t deal with chroot environments.
The pwd and root paths are two sides of the same coin, and any code that treats them this differently is *wrong*.
So this basically takes a audit mistake, and makes that mistake visible to non-audit users.
I absolutely hate it.
….
[After thinking through options]Yes, yes, the above is very very handwavy and I didn’t write the code, and possibly the rantings of an insane person.
I didn’t think a *lot* about this. I just looked at this patch series and threw up in my mouth a little, and tried to think of “maybe we could do it differently” and the above is a rough draft of something that might work.”
Brauner in turn responded in agreement with Torvalds’ assessment that he doesn’t like that code either and feeling like a hack and dropping it from the pull request.
Separately, another early Linux 7.1 pull request that rubbed Linus the wrong way was the RCU code. A BOOTPARAM_RCU_STALL_PANIC Kconfig option was proposed that would allow triggering a kernel panic on RCU stalls by default. At compile-time, BOOTPARAM_RCU_STALL_PANIC could be enabled if desiring a panic/reboot on RCU CPU stall as the default value of the kernel.panic_on_rcu_stall sysctl. The intent was for high availability systems that require automatic recovery when a CPU stall is detected without needing user-space to configure that sysctl configurable value at boot time. Torvalds responded to that with:
“No.
Dammit, stop doing these horrible things.
I’ve said this a million times by now: the kernel config phase is probably one of the biggest pain points for random new people trying to build their own kernels, and we DO NOT ASK PEOPLE STUIPID THINGS.
People who want this can damn well use the sysctl or maybe some kernel command line option instead, and we DO NOT MAKE LIFE WORSE FOR EVERYBODY ELSE.
I have removed that disgusting kernel kconfig option, and I want to make it very clear to people that the kconfig files aren’t your personal playgrounds: they are the most visible interface to NORMAL USERS that we want to encourage to build their own kernels so that they can participate in kernel development.
So people: next time you feel the urge to add a Kconfig option, take a deep breath, and look at that Kconfig file, and go “Maybe I can instead remove this *other* useless option instead”.
Let’s *improve* on the kernel instead of making things worse.”
In the ensuing discussion it was also raised that the same default behavior can already be achieved with a boot parameter of sysctl.kernel.panic_on_rcu_stall=true with being able to easily set sysctl defaults at kernel boot time in that form.
