AMD’s VP of AI Software, Anush Elangovan, has used Claude Code to help craft a pure-Python AMD GPU user-space driver. This Python user-space driver is currently being created to help exercise other ROCm code and for debugging in passing through the ROCm/HIP user-space stack.
Anush was inspired by Tinygrad’s user-space AMD GPU driver implementation and with Claude AI has created a user-space driver for stress testing of SDMA and compute/communications overlap debug. Anush posted on X: “I didn’t open the editor once. [AI] Agents are the great equalizer in software. And Speed is the moat.“
With further work, the user-space driver has also been working on compute bound kernel support too.
This user-space driver is currently being developed via this GitHub branch. The initial commit explains as of the first features in place:
“Add pure-Python AMD GPU userspace driver
A standalone Python driver that talks directly to /dev/kfd and /dev/dri/renderD* via ctypes ioctls, bypassing the ROCm/HIP userspace stack. Supports KFD backend with pluggable architecture for future bare-metal PCI (AM) backend.Features:
– KFD ioctl bindings (queue, memory, events)
– GPU family registry (RDNA2/3/4, CDNA2/3)
– SDMA copy engine with linear copy and fence packets
– PM4 compute packet builder (dispatch, release_mem, etc.)
– Timeline semaphore for GPU-CPU synchronization
– Topology parser for /sys/devices/virtual/kfd/kfd
– ELF code object parser for kernel loading
– 130 tests passing (unit + integration on MI300X/gfx942)Co-Authored-By: Claude (claude-opus-4-6)”
Over the past two days this pure-Python AMD user-space driver has been extended to include multi-GPU support, compute-bound kernels, and other functionality. It will be interesting to see where work on this Python-based AMD GPU user-space driver leads.
