This week’s “sched/urgent” pull request was sent out today of scheduler updates for the ongoing Linux 7.0 cycle. Notable this week are fixing some hangs as well as a possible performance regression on large systems.
The scheduler fixes sent out today for Linux 7.0 address some issues in the scheduler code present since last November following the big mm/cid rewrite. There were already some regression fixes due to that code previously merged while some additional fallout was recently spotted.
Last month this mailing list thread between kernel developers noted kernel stalls when starting a VSOCK listening socket. There was the possibility of soft lockups, RCU stalls, and a timeout.
Thomas Gleixner sorted through the issue and led to several patches for fixing issues in the mm/cid code. That led to today’s sched/urgent pull request with those patches in tow:
“More MM-CID fixes, mostly fixing hangs/races:
– Fix CID hangs due to a race between concurrent forks
– Fix vfork()/CLONE_VM MMCID bug causing hangs
– Remove pointless preemption guard
– Fix CID task list walk performance regression on large systems by removing the known-flaky and slow counting logic using for_each_process_thread() in mm_cid_*fixup_tasks_to_cpus(), and implementing a simple sched_mm_cid::node list instead”
Those patches should be merged today ahead of the Linux 7.0-rc4 release.
Completely separate but another hang fix for Linux 7.0-rc4 comes by way of today’s x86/urgent fixes pull. This fixes a suspend-to-RAM bug if the firmware unexpectedly re-enables the x2apic hardware when it was previously displayed by the kernel before suspending. The kernel will now disable x2apic on resume if the kernel expects it to, in order to avoid hangs. That patch explains:
“When resuming from s2ram, firmware may re-enable x2apic mode, which may have
been disabled by the kernel during boot either because it doesn’t support IRQ
remapping or for other reasons. This causes the kernel to continue using the
xapic interface, while the hardware is in x2apic mode, which causes hangs.
This happens on defconfig + bare metal + s2ram.Fix this in lapic_resume() by disabling x2apic if the kernel expects it to be
disabled, i.e. when x2apic_mode = 0.”
Look for Linux 7.0-rc4 later today.
