When recently carrying out the Windows 11 25H2 vs. Ubuntu Linux benchmarks I also ended up carrying out some Llama.cpp AI benchmarks as the first time exploring the AI inferencing performance between Windows and Linux for both CPU and GPU-accelerated deployments. Here are those results for exploring the Llama.cpp performance between Windows and Linux with different large language models.
With creator workloads and other areas commonly explored in our Windows vs. Linux benchmarking, Linux typically leads even when using the common Ubuntu at its defaults and then obviously even more aggressive uplift with the likes of CachyOS. But to date I hadn’t explored the Llama.cpp AI performance as not too often do I have Microsoft Windows installations running in the lab, but as time allowed and the continuing maturity of the Llama.cpp AI software, I decided to carry out some benchmarks.
This testing is looking at the performance of the native Llama.cpp Windows and Linux builds both for CPU execution and then GPU acceleration too using Vulkan. The AMD Ryzen 9 9950X3D 3D V-Cache 16-core processor was in use for this testing along with an AMD Radeon RX 9070 XT graphics card.
The same hardware was in used during this cross-platform Llama.cpp AI benchmarking. Microsoft Windows 11 25H2 via the preview channel was in use with all available updates as of 6 September plus using the latest Radeon Software 25.8.1 driver release for the RDNA4 GPU support. On the Linux side tests were done on Ubuntu 24.04.3 LTS. Ubuntu 24.04.3 was tested with its HWE stack of Linux 6.14 and Mesa 25.0. The Ubuntu tests were then repeated when upgrading to the Linux 6.17 Git kernel and the Mesa 25.3-devel drivers for the very latest up-to-date open-source driver support.
This testing is quite straight-forward so let’s get right to it.