Yesterday I was invited along with a small group of others to try out the AMD Instinct MI355X accelerator down in Austin, Texas. The AMD Instinct MI355X is fully supported with the newly-released AMD ROCm 7.0.
The AMD Instinct MI355X “hands on” yesterday to celebrate ROCm 7.0 and the MI350X/MI355X hardware ended up being just following a guided Jupyter Notebook for an AI demo… And one that wasn’t even performance-related or anything unique to the AMD Instinct MI350 series capabilities. Not quite the hands-on time expected with originally hoping there would be enough time to tap some MI355X accelerators unconstrained and run some AI/LLM benchmarks at least with Llama.cpp and vLLM. Nevertheless via Jupyter Notebook’s terminal allowed for poking at the MI355X on ROCm 7.0 during this demo session.
The Instinct MI355X hardware for the “hands on” demo was running at TensorWave. TensorWave is the first public cloud provider so far with Instinct MI355X availability. I’m hoping to get access to Instinct MI355X remotely for more than a brief period of time to be able to load up some large benchmarks and without being constrained by a shared Jupyter Notebook instance.
The MI355X was working on the just-minted ROCm 7.0.
This was my first time trying out an instance on TensorWave and it worked smoothly with all the necessary setup in place, etc.
The host processors were the extremely capable AMD EPYC 9575F. If you missed it recently, see AMD EPYC 9575F CPUs For GPU/AI Servers Show Leading Performance In Benchmarks.
This Jupyter Notebook ended up just being one Instinct MI355X shared among a few folks and paired with the limited time didn’t work out for providing any meaningful benchmarks. The time allotted for this hands-on demo could simply be occupied with the time it takes to download some LLMs.
For those curious, OpenCL does work on the AMD Instinct MI355X… May be interesting for the likes of FluidX3D CFD and the like, but simply put OpenCL 2.1 was indeed present and working on the MI355X when running some quick OpenCL micro-benchmarks.
The RADV Vulkan driver was also enumerating the AMD Instinct MI355X. Though RADV has known limitations with Instinct/CDNA. RADV would just be useful for compute-only Vulkan workloads with no graphics engine, with the likes of Llama.cpp and others having a Vulkan back-end. Anyways, I was just curious when briefly poking at this AMD Instinct MI355X accelerator. (Coincidentally later today I have some interesting Vulkan back-end Llama.cpp benchmarks coming out on other hardware.)
That’s all for now given the constraints. Thanks to AMD for having me down in Austin and thanks to TensorWave for hosting the Instinct MI355X access. Hopefully soon we’ll find proper access to the AMD Instinct MI355X for performance benchmarking and MI300X/MI325X again for some proper relevant comparison performance testing.