Nvidia Previews Rubin CPX Graphics Card For Disaggregated Inference - News

Nvidia Corp. today previewed an upcoming chip, the Rubin CPX, that will power artificial intelligence appliances with 8 exaflops of performance.

AI inference involves two main steps. First, an AI model analyzes the information on which it will draw to answer the user’s prompt. Once the analysis is complete, the algorithm generates its prompt response one token at a time. Today, the two tasks are usually done using the same hardware.

Nvidia plans to take a different approach with its future AI systems. Instead of performing both steps of the inference workflow using the same graphics card, it plans to assign each step to a different chip. The company calls this approach disaggregated inference.

Nvidia’s upcoming Rubin CPX chip is optimized for the initial, so-called context phase of the two-step inference workflow. The company will use it to power a rack-scale system called the Vera Rubin NVL144 CPX (pictured.) Each appliance will combine 144 Rubin CPX chips with 144 Rubin GPUs, upcoming processors optimized for both phases of the inference workflow. The accelerators will be supported by 36 central processing units.

The company says the upcoming system will provide 8 exaflops of computing capacity. One exaflop corresponds to a quintillion computing operations per second. That’s more than seven times the performance of the top-end GB300 NVL72 appliances currently sold by Nvidia.

Under the hood, the Rubin CPX is based on a monolithic die design with 128 gigabytes of integrated GDDR7 memory. Nvidia also included components optimized to run the attention mechanism of large language models.

An LLM’s attention mechanism enables it to identify and prioritize the most important parts of the text snippet it’s processing. According to Nvidia, the Rubin CPX can perform the task three times faster than its current-generation silicon. “We’ve tripled down on the attention processing,” said Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing.

The executive detailed that video processing workloads will receive a speed boost as well. The Rubin CPX includes hardware-level support for video encoding and decoding. That’s the process of compressing a clip before it’s transmitted over the network to save bandwidth and then restoring the original file.

According to Nvidia, the Rubin CPX will enable AI models to process prompts with one million tokens’ worth of data. That corresponds to tens of thousands of lines of code or one hour of video. In many cases, increasing the amount of data an AI model can consider while generating a prompt response boosts its output quality.

Nvidia plans to start shipping the Rubin CPX at the end of 2026.

Image: Nvidia

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About News Media

News Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — News Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Nvidia previews Rubin CPX graphics card for disaggregated inference – News

Image: Nvidia

Leave a Reply Cancel reply

Stay Connected

Latest News

Seasun Games announces launch of Mecha BREAK on July 2 · TechNode

Get Your Head in the Game and Save 29% on Astro A10 Headset

Architecting the Data Backbone: Chandra Adapa’s Global MDM Footprint from Teradata to LabCorp | HackerNoon

macOS Tahoe is getting a new Repair Assistant for part calibration and pairing

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Image: Nvidia

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News