Nvidia’s upcoming Blackwell GPUs for AI computing may face further delays because they’re prone to overheating when connected to each other on server racks, according to a new report from The Information.
The server rack Nvidia designed for Blackwell—which can connect up to 72 GPUs at a time—is reportedly causing the overheating issue. Nvidia has repeatedly redesigned the racks, which could result in GPU server shipments being delayed and new Google, Microsoft, or Meta data centers may not be able to open on schedule.
Back in August, a previous report suggested that a “design flaw” had caused the Blackwell GPUs’ launch to be delayed by months. It’s unclear whether this flaw is the server rack design issue, though it’s possible. Nvidia had announced Blackwell back in March, and initially said the GPUs could ship as soon as Q2 2024 before it encountered challenges.
Nvidia indirectly addressed the server rack problem in a statement to Reuters. “Nvidia is working with leading cloud service providers as an integral part of our engineering team and process. The engineering iterations are normal and expected,” a company spokesperson said, suggesting a new server design could be on the horizon.
Overheating is a main cause of performance issues for GPUs, which can consume a lot of energy to operate. The crypto mining industry, like AI, also uses a ton of energy, produces a lot of heat, and relies on high numbers of interconnected GPUs or mining rigs. Sometimes, crypto miners use immersion cooling, where the rigs are submersed in liquid, to prevent overheating.
And the more powerful a GPU, the more heat it can produce. While sometimes tech advancements can bring more energy efficiencies, this typically isn’t enough to offset the increased energy needs overall. The Blackwell AI chips can be 30 times faster than previous GPUs, according to Nvidia.
Training and running generative AI models at scale requires a ton of energy, too, as well as water to cool these systems. This has lead some experts to predict that AI data centers may face power shortages as soon as next year. This is because AI firms aren’t able to add new power sources to grid as quickly as they can add data centers—and they aren’t necessarily willing to wait, either.
Recommended by Our Editors
Meta, Microsoft, and Google have recently turned to nuclear power to meet their rising energy needs. However, “power purchase agreements” don’t necessarily solve AI’s energy problems.
Nvidia has seen its stock soar over 180% in the past year due to the AI surge and resulting spike in chip demand, while rival AMD recently began mass layoffs.
Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.