Christian Brauner sent in a dozen VFS pull requests that are now-merged today for the Linux 7.0 kernel. The VFS pull requests worth noting right away in this article are the introduction of the NULLFS and OPEN_TREE_NAMESPACE features.
NULLFS is merged as a “completely catatonic minimal pseudo filesystem” that Brauner further elaborates as:
“Add a completely catatonic minimal pseudo filesystem called “nullfs” and make pivot_root() work in the initramfs.
Currently pivot_root() does not work on the real rootfs because it cannot be unmounted. Userspace has to recursively delete initramfs contents manually before continuing boot, using the fragile switch_root sequence (overmount + chroot).
Add nullfs, a minimal immutable filesystem that serves as the true root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is mounted on top of it. This allows userspace to simply:
chdir(new_root);
pivot_root(“.”, “.”);
umount2(“.”, MNT_DETACH);without the traditional switch_root workarounds. systemd already handles this correctly. It tries pivot_root() first and falls back to MS_MOVE only when that fails.
This also means rootfs mounts in unprivileged namespaces no longer need MNT_LOCKED, since the immutable nullfs guarantees nothing can be revealed by unmounting the covering mount.
nullfs is a single-instance filesystem (get_tree_single()) marked SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root directory. This means sooner or later it can be used to overmount other directories to hide their contents without any additional protection needed.
We enable it unconditionally. If we see any real regression we’ll hide it behind a boot option.
nullfs has extensions beyond this in the future. It will serve as a concept to support the creation of completely empty mount namespaces – which is work coming up in the next cycle.”
NULLFS is just dozens of lines of code and now merged for Linux 7.0.
Also part of the VFS pulls that are merged for Linux 7.0 is adding OPEN_TREE_NAMESPACE as a security and performance win for containers. The VFS namespace merge allows statmount to accept a file descriptor as a parameter, drops the old mount API code, and adds OPEN_TREE_NAMESPACE. The OPEN_TREE_NAMESPACE flag is officially summed up as:
“Container runtimes currently use CLONE_NEWNS to copy the caller’s entire mount namespace — only to then pivot_root() and recursively unmount everything they just copied. With large mount tables and thousands of parallel container launches this creates significant contention on the namespace semaphore.
OPEN_TREE_NAMESPACE copies only the specified mount tree (like OPEN_TREE_CLONE) but returns a mount namespace fd instead of a detached mount fd. The new namespace contains the copied tree mounted on top of a clone of the real rootfs.
This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER) followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by the new user namespace. Mount namespace file mounts are excluded from the copy to prevent cycles. Includes ~1000 lines of selftests”
Lots of great changes landing in kicking off the Linux 7.0 cycle. Besides the symbolic version bump, Linux 7.0 is all the more important with being the kernel version to power Ubuntu 26.04 LTS.
