The wonderful IO_uring for the Linux kernel for high performance asnyc I/O has picked up a new capability with Linux 7.0: BPF filtering.
Linux I/O expert Jens Axboe implemented support for loading BPF programs with IO_uring for offering fine-grained filtering of SQE operations. This BPF filtering for IO_uring can inspect request attributes and make dynamic filtering decisions compared to existing facilities for filtering. Filters can allow or deny requests, allow multiple filters to be stacked per opcode and is done using classic BPF programs rather than eBPF programs to allow for container uses.
“This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters.
seccomp and io_uring don’t play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring.
As a result, things like containers and systemd that apply seccomp filters, can’t filter io_uring operations.
That leaves them with just one choice if filtering is critical – filter the actual io_uring_setup(2) system call to simply disallow io_uring. That’s rather unfortunate, and has limited us because of it.
io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal – an opcode is either enabled or it’s not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it’s not useful at all to systemd or containers which really want both broader and more specific control.
This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well.”
This merge yesterday to Linux 7.0 landed the IO_uring BPF filtering capabilities.
