Since the launch of the AMD EPYC 9005 series nearly one year ago, I have performed hundreds of different benchmarks on these EPYC “Turin” processors across a wide range of workloads/disciplines to really terrific performance, power efficiency, and value. AMD EPYC 9005 performs exceptionally well compared to the competition from Intel and ARM CPU vendors. One area though I hadn’t explored to this point was how well the AMD EPYC 9005 series performs for serving as the host CPU for GPU/AI servers. That changed as I recently wrapped up some benchmarks exploring that area using the AMD EPYC 9575F and it managed to accelerate past the available competition in proving capable of being the superior host processor for AI servers.
Back at the AMD Advancing AI Day event in June while talking about the consistently strong performance of the AMD EPYC 9005 “Turin” processor performance I’ve seen at Phoronix with my near-constant benchmarking of different areas, one uncovered area came up: the performance of the AMD EPYC 9005 series as host CPUs for GPU/AI servers. In particular, the high-frequency Turin CPUs for AI servers. As explained in those conversations, my number one limitation besides time usually comes down to simply having the hardware for interesting benchmarks and in turn for providing Phoronix content. In this case, the appropriate GPU server platforms. AMD offered to provide me gratis access to some servers in one of their labs if I wanted to explore such a scenario, so I agreed.
AMD provided remote access to two similarly configured servers: one with two AMD EPYC 9575F processors and the other with two Intel Xeon Platinum 8592+ processors. Both servers were equipped with eight NVIDIA H100 80GB GPUs. Both using Supermicro production server platforms (the Turin based on Supermicro AS-8125GS-TNHR H13DSG-O-CPU-D and the Intel based on Supermicro SYS-821GE-TNHR X13DEG-OAD). Both servers were running at their maximum rated memory channels and speed and with similar RAID storage arrays on each server. Ubuntu 22.04 LTS running on each server with Linux 5.15. Both running with the “performance” CPU frequency scaling governor.
Though there is one important caveat to point out… the Intel Xeon Platinum 8592+ is an Emerald Rapids part compared to the current-generation Xeon 6 Granite Rapids. But that’s because they simply haven’t been able to source enough Granite Rapids hardware yet. Indeed, it’s still difficult to obtain Granite Rapids processors / servers in normal retail channels. I’m told next month they hope to have a similar Granite Rapids server in their lab and I am welcome then to repeat my tests on that current Intel Xeon 6 platform but for now in terms of what is readily available in the marketplace, it’s still predominantly Emerald Rapids.
The AMD EPYC 9575F as a reminder features sixty-four Zen 5 cores (128 threads) with a 3.3GHz base clock, 4.5GHz all-core boost speed, and 5.0GHz maximum boost clock. There is a 256MB L3 cache with the EPYC 9575F and it has a default TDP of 400 Watts while the cTDP allows configuring from 320 to 400 Watts. The AMD EPYC 9575F carries a list price of $11,791. The Intel Xeon Platinum 8592+ as the closest readily-available competition for GPU/AI servers is also 64 cores / 128 threads while having a 1.9GHz base frequency and 3.9GHz maximum turbo frequency. The Xeon Platinum 8592+ does have a 320MB cache and is rated for a 350 Watt TDP while having a similar list price of $11,600 USD. Besides the AMD EPYC 9575F having an advantage with its higher base and boost frequencies, the EPYC 9575F also supports 12 channels of DDR5-6000/DDR5-6400 memory where as the Xeon Platinum 8592+ only allows eight channels with DDR5-5600 memory.
From these two competing Intel and AMD servers I was able to remotely carry out my own GPU/AI benchmarks for seeing the impact on the different host processors. Yes, they were both running within AMD’s labs and outside of my control albeit that is all I had access to for this round of testing. But they were similarly configured, there didn’t appear to be any thermal/power handicaps or anything else appearing in my remote monitoring, nor anything else to point to any questionable behavior on AMD’s part. Plus I was free to configure the server software as I wished and they imposed no other limitations on me. As usual, this also isn’t any sponsored article or anything along those lines with no paid content on Phoronix: AMD offered up free/gratis access to some hardware remotely without imposing limitations and I agreed. While it’s always best having hands-on hardware in the lab especially for follow-up testing and other related content, remote testing works when needed.
So then it was off to the benchmark races for seeing the difference of AMD EPYC 9575F vs. Intel Xeon Platinum 8592+ for GPU-accelerated AI servers.