By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Gartner: Considerations when using GPUs in the datacentre | Computer Weekly
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Gartner: Considerations when using GPUs in the datacentre | Computer Weekly
News

Gartner: Considerations when using GPUs in the datacentre | Computer Weekly

News Room
Last updated: 2025/05/19 at 5:39 AM
News Room Published 19 May 2025
Share
SHARE

CIOs expect extensive value from their artificial intelligence (AI) investments, including increased productivity, enhanced customer experience (CX) and digital transformation. As a result, Gartner client interest in deploying AI infrastructure – including graphics processing units (GPUs) and AI servers – has grown substantially. 

Specifically, client enquiries regarding GPUs and AI infrastructure increased nearly fourfold annually from October 2022 through October 2024. Clients are exploring the use of hosted, cloud and on-premise-based options for GPU deployment. In some cases, enterprises will select a “full-stack” AI offering that includes GPU, compute, storage and networking in a bundled package. In other instances, enterprises will select and deploy the pieces, individually selected and integrated. The requirements of AI workloads are different from most existing datacentre workloads.

Multiple interconnect technologies are available to support GPU connectivity. A common question from Gartner clients is: “Should I use Ethernet, InfiniBand or NVLink to connect to GPU clusters?” All three approaches can be valid, depending on the scenario.

These technologies are not mutually exclusive. Enterprises can deploy them in conjunction with one another (for example, InfiniBand or Ethernet) to scale out beyond a rack. A common misconception is that only InfiniBand or a supplier-proprietary interconnect technology (such as NVLink) can deliver appropriate performance and reliability.

However, Gartner recommends that enterprises deploy Ethernet over alternative technologies, such as InfiniBand, for GPU clusters up to several thousand. Ethernet-based infrastructure can provide the necessary reliability and performance, and there is widespread enterprise experience with the technology. Furthermore, a broad ecosystem of suppliers is associated with Ethernet technology. 

Optimise network deployments for GPU traffic 

The current state of practice for computer processing unit (CPU)-based, general-purpose computing workloads is a leaf/spine network topology.

However, leaf-spine topologies are not always optimal for AI workloads. In addition, running AI workloads colocated with existing datacentre networks can create noisy neighbour effects that degrade performance both for AI and existing workloads. This can delay the processing and job completion time for AI workloads, which is highly inefficient.

In a buildout of AI infrastructure, networking switches typically represent 15% or less of the cost. As a result, saving money by using existing switches often leads to suboptimal overall price/performance for the AI workload investment. As a result, Gartner makes several recommendations. 

Due to the unique traffic requirements and GPU costs, Gartner suggests building out dedicated physical switches for GPU connectivity. Furthermore, rather than defaulting to a leaf-spine topology, Gartner also suggests using a minimal number of physical switches to reduce physical “hops”. This could ultimately lead to a leaf-spine topology, as well as other topologies, including single-switch, two-switch, full-mesh, cube-mesh and dragonfly.

Avoid using the same switches for other generalised datacentre computing needs. For GPU clusters below 500 GPUs, one or two physical switches is ideal. For organisations with more than 500 GPUs, Gartner advises IT decision-makers to build out a dedicated AI Ethernet fabric. This is likely to require a deviation from the standard, state-of-practice, top-of-rack topologies towards middle-of-row and/or modular switching implementations. 

Enhance Ethernet buildouts

Gartner recommends using dedicated switches for GPU connectivity. When deploying Ethernet (compared with InfiniBand or shelf/rack/row optimised), use switches with specific requirements. Switches need to support: 

  • High-speed interface for GPUs, including 400Gbps access ports and above.
  • Support for lossless Ethernet, including advanced, congestion-handling mechanisms – for example, datacentre quantised congestion notification (DCQCN).
  • Advanced traffic-balancing capabilities, including congestion-aware load balancing.
  • Remote Direct Memory Access (RDMA)-aware load balancing and packet spraying.

Support for static pinning of flows 

Furthermore, the software to manage AI networking fabrics must be enhanced as well. This requires functionality at the management layer to alert, diagnose and remediate issues quickly. In particular, management software that provides advanced granular telemetry (including sub-second and sub-100 millisecond intervals) is ideal for troubleshooting and visibility. In addition, the ability to monitor and alert (in real time) and provide historical reporting for bandwidth utilisation, packet loss, jitter, latency and availability at the sub-second level is required. 

Ultra Ethernet (and accelerator) support

When building fabrics, Gartner advises IT leaders to consider hardware providers that pledge to support the Ultra Ethernet Consortium (UEC) and Ultra Accelerator Link (UAL) specifications.

The UEC is developing an industry standard to support high-performance workloads on Ethernet. As of February 2025, there is no proposed standard available, but Gartner expects a proposal before the end of 2025. The need for a standard stems from the fact that suppliers currently use proprietary mechanisms to provide the high-performance Ethernet necessary for AI connectivity. 

Long term, this reduces interoperability for customers as it locks them into a single supplier’s implementation. The benefit of suppliers confirming a consistent UEC standard is the ability to interoperate.

There is also a separate, but related, standards effort for shelf/rack/row-optimised accelerator link called the UAL. The goal of UAL is to standardise a high-speed, scale-up accelerator interconnect technology aimed at addressing scale-up network bandwidth needs that are beyond what Ethernet and InfiniBand are currently capable of. 

Reduce risk with co-certified implementations

Finally, because of the stringent performance requirements for AI workloads, connectivity between GPU and network switches needs to be optimised and error-free from a hardware and software perspective. This can be increasingly challenging, given the rapid pace of change associated with both networking and GPU technology.

To mitigate the potential for implementation challenges, Gartner recommends following validated implementation guides that are co-certified (see box: Benefits of co-certification of networking GPUs) by the networking and GPU suppliers. The value of following co-certified design is that both suppliers should stand by deployments that are done according to this specification, ultimately reducing the likelihood of issues and decreasing mean time to repair (MTTR) in the event of an issue.


This article is based on an excerpt of the Gartner report, Key networking practices to support AI workloads in the data center. Andrew Lerner is a distinguished vice-president analyst at Gartner.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Web3 ai Presale Nears $4.5M with 1747% ROI Edge as DOT Rallies and XRP Eyes $7 Breakout
Next Article Device Memory TCP TX Support Queued Ahead Of Linux 6.16
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

TuSimple closes Guangzhou gaming division and faces legal claims from employees · TechNode
Computing
Windsurf Launches SWE-1 Family of Models for Software Engineering
News
How to watch Microsoft’s Build 2025 conference
News
12 Ways to Upgrade Your Wi-Fi and Make Your Internet Faster
Gadget

You Might also Like

News

Windsurf Launches SWE-1 Family of Models for Software Engineering

4 Min Read
News

How to watch Microsoft’s Build 2025 conference

3 Min Read
News

Apple may lower App Store commission rate to ‘stay competitive’, report suggests – 9to5Mac

3 Min Read
News

Apple iPhone 16E Specs vs. iPhone 15 Pro: New Entry-Level or Last Year's Pro

10 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?