As part of Intel’s ongoing Project Battlematrix efforts that include SR-IOV support for Arc Pro cards as well as multi-device (multi-GPU) support for allowing up to eight Intel Arc Pro graphics cards in a single system, today Intel engineers posted their preliminary Linux driver patches for pinned device memory functionality that is important for multi-GPU usage.
Intel engineers continue working on their multi-GPU support for Linux after some prep patches were merged for Linux 6.17. Out today in a “request for comments” (RFC) form is functionality for pinned device memory within the context of cgroups. Pinned device memory becomes very important for multiple GPU usage for performance reasons.
Intel engineer Maarten Lankhorst explained with today’s patches:
“When exporting dma-bufs to other devices, even when it is allowed to use move_notify in some drivers, performance will degrade severely when eviction happens.
A [particular] example where this can happen is in a multi-card setup, where PCI-E peer-to-peer is used to prevent using access to system memory.
If the buffer is evicted to system memory, not only the evicting GPU [where] the buffer resided is affected, but it will also stall the GPU that is waiting on the buffer.
It also makes sense for long running jobs not to be preempted by having its buffers evicted, so it will make sense to have the ability to pin from system memory too.
This is dependant on patches by Dave Airlie, so it’s not part of this series yet. But I’m planning on extending pinning to the memory cgroup controller in the future to handle this case.”
As part of the proposed patches, the usage from the Intel Xe kernel graphics driver side is introducing the new “DRM_XE_GEM_CREATE_FLAG_PINNED” flag to indicate the buffer object should be pinned upon allocation for its entire lifetime.