The Linux kernel’s workqueue for async task handling within a dedicated kernel thread is seeing some useful improvements with Linux 7.0.
Most notable for Linux 7.0 is work on the workqueue rescuer that is used to prevent deadlocks in the workqueue when the system is under memory pressure. Lai Jiangshan of the Ant Group worked on a workqueue rescuer improvement to avoid a situation where a single long-blocking work item could stall all work items behind it and thus causing high latency for the rest of the queue. The merge request explained:
“Rework the rescuer to process work items one-by-one instead of slurping all pending work items in a single pass. As there is only one rescuer per workqueue, a single long-blocking work item could cause high latency for all tasks queued behind it, even after memory pressure is relieved and regular kworkers become available to service them.”
Lai Jiangshan elaborated in the patch series for improving the rescuer:
“Previously, the rescuer scanned for all matching work items at once and processed them within a single rescuer thread, which could cause one blocking work item to stall all others.
Make the rescuer process work items one-by-one instead of slurping all matches in a single pass.
Break the rescuer loop after finding and processing the first matching work item, then restart the search to pick up the next. This gives normal worker threads a chance to process other items which gives them the [opportunity] to be processed instead of waiting on the rescuer’s queue and prevents a blocking work item from stalling the rest once memory pressure is relieved.”
The workqueue pull request also adds a CONFIG_BOOTPARAM_WQ_STALL_PANIC Kconfig build time option and workqueue.panic_on_stall_time parameter for time-based stall panic for more system control over workqueue stall handling. CONFIG_BOOTPARAM_WQ_STALL_PANIC provides a build-time control over the number of workqueue stalls before triggering a kernel panic. This is mainly useful for high availability systems needing consistent panic-on-stall behavior when needing uptime guarantees and stalls needing to be punctually handled.
These workqueue improvements have been merged for Linux 7.0.
