Last year there was some ideas raised around potentially making use of the Linux kernel’s IO_uring functionality for graphics drivers to help with better performance and synchronization. It turns out Qualcomm engineers have recently been exploring IO_uring use for the DRM accelerator drivers with very promising results on their Cloud AI hardware in seeing around 50% speed-ups in ioctl execution time.
Posted last week by Qualcomm engineer Zack McKevitt was a “request for comments” patch on supporting IO_uring’s uring_cmd in DRM/accel drivers. Zack explains with the proposed patch:
“When issuing a batch of ioctl commands to a device, many context switches are performed. To minimize this overhead, we propose using io_uring to submit large batches of ioctl commands to a device all at once. Instead of calling ioctls directly, io_uring provides a uring_cmd calback that may be specified within any file or device’s file_operations structure that may be invoked by the ring.
For DRM devices that may need to issue large amounts of ioctls, we believe performance can be improved by placing uring_cmds to issue these ioctls in the ring and submitting them all at once.
This patch does not update the file_operations to include the uring_cmd callback function for all DRM devices. However, this may
be easily done in the future without requiring modifications to existing drivers. Furthermore, this design could be extended to define new op codes within the drm_uring_cmd() callback which would allow for more customized handling, assuming individual driver support.”
The patch is showing very promising performance results:
“Initial benchmarks on our Qualcomm Cloud AI 100 device show speedups of 50% in ioctl execution time in the best case for large batches of ioctls (128) issued together via drm_uring_cmd() compared to issuing these ioctls directly.”
We’ll see where this patch leads and what more performance/efficiency improvements may come by leveraging IO_uring within Direct Rendering Manager graphics and/or accelerator drivers.