Deploying vLLM for LLM inference and serving on NVIDIA hardware can be as easy as pip3 install vllm. Beautifully simple just as many of the AI/LLM Python libraries can deploy straight-away and typically “just work” on NVIDIA. Running vLLM atop AMD Radeon/Instinct hardware though has traditionally meant either compiling vLLM from source yourself or AMD’s recommended approach of using Docker containers that contain pre-built versions of vLLM. Finally there is now a blessed Python wheel for making it easier to install vLLM without Docker and leveraging ROCm.
It’s not yet as straight-forward as using the upstream vLLM with a pip install vllm but pretty close:
pip install vllm==0.14.0+rocm700 –extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
Anush Elangova as the VP of AI Software at AMD shared the good news today on X. Hopefully it won’t be too much longer before the ROCm support is as easy as installing the official vllm from PyPI. In any event, this is a step in the right direction.
