Nvidia To Upgrade AI Chatbot Performance With New 'LPU' Chip

To improve chatbot performance, Nvidia plans to sell a new kind of processor, an LPU, optimized to run large language models (LLMs).

The “Nvidia Groq 3 LPU” chip was among seven upcoming chips Nvidia touted at the company’s annual GTC event, where it pitched the AI industry on why Nvidia’s chips continue to lead.

The LPU, or Language Processing Unit, comes from Nvidia’s deal this past December to license technology from a California AI company called Groq (not to be confused with the AI chatbot Grok from xAI). Founded in 2016, Groq issued earlier LPU chips specifically designed for LLMs to offer faster speeds and energy efficiency. The aim: To create an alternative to Nvidia’s enterprise GPUs, which can be used for a wider range of AI workloads.

Nvidia now wants to pair the newly revealed Groq 3 LPU with the rest of the company’s next-generation AI chips, dubbed the “Vera Rubin” platform, which includes the upcoming Rubin GPU and Vera CPU tech for data centers.

(Credit: Michael Kan)

Groq’s LPU chips use even faster SRAM (static RAM), instead of HBM (high-bandwidth memory) typically found on Nvidia’s GPUs. But on the downside, Groq’s LPUs can only offer “hundreds of megabytes” in SRAM, whereas HBM memory can span over a hundred gigabytes or more per chip.

That’s why a single Groq 3 LPU only contains 500MB of SRAM, while Nvidia’s upcoming Rubin GPU will feature 288GB of HBM4 memory. To compensate for the lower memory capacity, Nvidia is preparing to sell large batches of LPUs to work alongside the rest of its data center chips, giving AI companies a way to squeeze out even more performance.

Get Our Best Stories!

Your Daily Dose of Our Top Tech News

What's New Now Newsletter Image

Sign up for our What’s New Now newsletter to receive the latest news, best new products, and expert advice from the editors of PCMag.

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy
Policy.

Thanks for signing up!

Your subscription has been confirmed. Keep an eye on your inbox!

Nvidia noted “the LPX rack with 256 LPU processors features 128GB of on-chip SRAM and 640TB/s of scale-up bandwidth. Deployed with Vera Rubin NVL72 (server unit), Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output Token.”

Nvidia slide

(Credit: Michael Kan)

A data center could thus harness both the LPUs and Nvidia’s GPUs, dividing AI workloads between them to increase efficiency. Nvidia’s CEO, Jensen Huang, said the combined approach excels at helping AI companies boost performance with longer prompts.

Nvidia slide

(Credit: Michael Kan)

Combined, the LPUs and Rubin GPUs also promise to deliver up to a 35x increase in throughput when running a large language model with 1 trillion parameters, according to Nvidia’s benchmarks.

Recommended by Our Editors

“We’re in production with the Groq chip,” Huang said, adding that it’ll likely ship in Q3. Nvidia has contracted Samsung to manufacture the LPU. One analyst already expects Nvidia to ship out 4 to 5 million LPUs through 2026 and 2027.

Nvidia chips

(Credit: Michael Kan)

The new LPU and Vera Rubin systems will likely cost tens of thousands of dollars per chip, putting them far out of reach of consumers. Instead, expect the biggest AI companies, including OpenAI, Anthropic, and Meta, to adopt these technologies, which could power your chatbot queries or image-generation requests in the near future.

At GTC, Nvidia also talked up Vera Rubin, about which the company has gone into detail before, including at January’s CES, where the company revealed the Rubin chips were in “full production.” Nvidia plans on shipping the Vera Rubin-related chips, including the new LPU chip, in this year’s second half.