Smarter Memory Handling Seen As Key To Next-gen AI - News

Artificial intelligence and high-performance computing are blending into a single ecosystem where integrated, co-designed infrastructure and smarter memory handling are essential. As systems scale, the focus is shifting toward architectures that reduce latency and make more efficient use of accelerated compute.

Graphics processing unit utilization has emerged as a central focus in large-scale AI deployments, tightly linked to how efficiently systems can offload and manage memory. This points toward software improvements as a critical way to push performance even further, according to David Noy (pictured), vice president of product management at Dell Technologies Inc.

“When you’re working in inferencing and you’re keeping track of the memory of a conversation … let’s say [with] a conversational chatbot, there’s all this context that’s been built up over time,” Noy told theCUBE. “You don’t want to have to rebuild that context every time. There is some amount of that that can be kept in memory. You don’t want to reuse the GPU cycles just to recalculate context.”

Noy spoke to theCUBE’s John Furrier and Jackie McGuire at SC25, during an exclusive broadcast on theCUBE, News Media’s livestreaming studio. They discussed how accelerated computing is reinventing system architecture and how the growing demand for smarter memory handling is influencing the future of AI workloads. (* Disclosure below.)

Smarter memory handling seen as key

Keeping part of the accumulated context in memory prevents the GPU from wasting cycles on already-processed information, Noy explained. If that memory can be extended onto storage with direct access, the system can maintain far more context without slowing down the GPU.

“We’ve announced integration with vLLM and LMCache using the NIXL transport protocol,” Noy said. “This is a protocol that allows the GPU to speak directly to the storage to save all the previous context of your conversation so you can have longer memory and not have to constantly do recalculation. That accelerates your time to first token by 19x.”

As AI factories scale toward exabyte-class deployments, power and space constraints are becoming just as important as raw performance. That’s driving Dell’s emphasis on co-designed systems that squeeze more useful work out of every rack unit and watt consumed, according to Noy.

“We’re hyper-focused on collaboration across our teams to make sure we’re building the most energy-efficient and space-efficient solutions for AI infrastructure,” he said. “If we can deliver double the performance per watt or per rack versus a competitor, we’ve effectively taken a 5% power budget and turned it into 10%. That’s a big deal.”

Here’s the complete video interview, part of News’s and theCUBE’s coverage of SC25:

(* Disclosure: Dell and Nvidia Corp. sponsored this segment of theCUBE. Neither Dell and Nvidia nor other sponsors have editorial control over content on theCUBE or News.)

Photo: News

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About News Media

News Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — News Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Smarter memory handling seen as key to next-gen AI – News

Smarter memory handling seen as key

Photo: News

Leave a Reply Cancel reply

Stay Connected

Latest News

The Best MagSafe Wallets to Keep Your Cards Safely in One Place

Clean up on Black Friday with a massive saving of $1,299 on this Roborock robot vacuum

5 Interesting Uses For Your Old Electronics You Have Lying Around – BGR

Has Britain become an economic colony?

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Smarter memory handling seen as key

Photo: News

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News