Linux engineer at Microsoft Christian Brauner sent out his set of 12 pull requests touching the VFS portion of the Linux kernel. These changes for the Linux 6.18 kernel include one pull request that touches the writeback code to address a situation of lockups being reported by users when systemd units read lots of files.
The problem at hand with these lockups will manifest when a systemd unit reads lots of files from a file-system mounted with the “lazytime” mount option. Lazytime being the option for only initially updating the access/modify/creation time on the in-memory version of the file inode to help with performance and reduce writes to disk. The on-disk timestamps are then updated during fsync and similar operations or when evicted from memory, among other possibilities.
Linux developers found that for systemd units reading many files with the lazytime mount option, there can reach “hundreds of thousands or millions” dirty inodes on cgroup exit to the parent cgroup. In turn the system can be hit for hours with 100% CPU usage
The pull request elaborates on the problem:
“This contains work adressing lockups reported by users when a systemd unit reading lots of files from a filesystem mounted with the lazytime mount option exits.
With the lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions.
The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb.”
The pull request also has a small sample script for demonstrating the issue on existing Linux kernel releases.
This issue should be addressed once these patches are merged for the upcoming Linux 6.18 merge window.