NVIDIA Launches Into Agentic AI With Vera CPU And Vera Rubin Platform

NVIDIA has shown in the GTC 2026 which is firmly committed to agentic AI with the launch of different components and hardware platforms designed to deploy and work with agents. It is the case of the CPU NVIDIA Vera and of the Vera Rubin platform.

The first is the first processor designed specifically to work on agentic AI and reinforcement learning. It delivers results that, according to NVIDIA estimates, are 50% faster than those achieved by traditional rack-scale CPUs.

It combines high-performance, energy-efficient CPU cores, a high-bandwidth memory subsystem, and the second generation of NVIDIA Scalable Coherency Fabric. This streamlines agentic responses under extreme usage conditions common in agentic AI and reinforcement learning.

Vera features 88 custom-designed Olympus cores, delivering high performance for compilers, runtime engines, analysis pipelines, agent tools, and orchestration services. Each core can run two tasks, using NVIDIA Spatial Multithreading to deliver consistent, predictable performance needed by multitenant AI factories that run many tasks at the same time.

On the other hand, Vera has the second generation of the company’s low-power memory subsystem, now based on LPDDR5X memory, which offers up to 1.2 TB/s of bandwidth and improves energy efficiency.

The NVIDIA Vera CPU allows companies and entities of any size and sector to create AI factories to work with agentic AI. Among the main hyperscalars that collaborate with NVIDIA for the implementation of Vera are Alibaba, CoreWeave, Meta and Oracle Cloud Infrastructure, as well as system manufacturers such as Dell, HPE, Lenovo or Supermicro.

With Vera as a base, NVIDIA has announced a new Vera CPU rack that integrates 256 liquid-cooled Vera CPUs to support more than 22,500 simultaneous CPU environments. Each of them is capable of operating at full capacity independently. AI factories can be deployed and scaled to tens of thousands of simultaneous instances and autonomous tools in a single rack.

The Vera rack has been developed with the NVIDIA MGX modular reference architecture. As for Vera CPUs, they are combined with NVIDIA GPUs through NVIDIA NVLink-C2C interconnect technology, with a coherent bandwidth of 1.8 TB/s that allows high-speed data exchange between CPU and GPU.

NVIDIA has also introduced several reference designs that use Vera as a host CPU for NVIDIA HGX Rubin NVL8 systems, coordinating data movement and system control for GPU-accelerated workloads. All single-socket, dual-CPU server configurations already offered by Vera system partners integrate NVIDIA ConnectX SuperNIC cards and NVIDIA BlueField 4 DPUs to accelerate networking, storage, and security; which is critical for agentic AI.

NVIDIA Vera Rubin, a platform with seven chips for AI factories

Another of NVIDIA’s main novelties at GTC 2026 is the platform Vera Rubin, with seven chips that are already in productionand which aims to expand and provide more power to AI factories. The platform combines the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch, as well as the NVIDIA Groq3 LPU.

These chips, which were designed to be able to work together, drive all phases of AI, from large-scale pre-training to post-training and test-time scaling, to real-time agentic inference.

The rack NVIDIA Vera Rubin NVL72 integrates 72 Rubin GPUs and 36 Vera CPUs connected via NVLink 6, With ConnectX-9 SuperNIC and BlueField-4 DPU, it enables training large expert mixture models with a quarter of the GPUs of the NVIDIA Blackwell platform. Plus, get up to 10x higher inference performance per watt. All with a tenth of the cost per token.

It is designed for hyperscale AI factories, and can be added to NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet to maintain high utilization in massive GPU clusters while reducing training time and cost of ownership.

He NVIDIA Vera CPU rackmeanwhile, offers a dense, liquid-cooled infrastructure based on NVIDIA MGX, integrating 256 Vera GPUs to deliver scalable and energy-efficient capacity, with world-class single-threaded performance, enabling large-scale agentic AI development.

Vera CPU racks, integrated with Spectrum-X Ethernet networking, keep CPU environments synchronized throughout the AI factory. With GPU compute racks, they provide the CPU foundation for large-scale agentic AI and reinforcement learning.

He rack NVIDIA Groq 3 LPXfor its part, is designed for the low latency and broad context demands of agentic systems. LPX and Vera Rubin bring together the performance of both processors to deliver up to 35x higher inference performance per megawatt. The LPX rack with 256 LPU processors has 128 GB of integrated SRAM and 640 TB/s of scalable bandwidth. Implemented with Vera Rubin NVL72, Rubin GPUs and LPUs power decoding by jointly computing each layer of the AI model for each output token.

The LPX architecture is optimized for billion-parameter models and million-token contexts, and is combined with Vera Rubin to increase power, memory, and compute efficiency. LPX, fully liquid-cooled and built on the MGX infrastructure, easily integrates into next-generation Vera Rubin AI factories, which will be available in the second half of 2026.

The rack-scale system NVIDIA BlueField-4STX is an AI-native storage infrastructure that extends GPU memory through POD. Powered by BlueField-4, which combines the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC, STX provides a shared, high-bandwidth layer for storing and retrieving hot-key cache data generated by LLM and agentic AI workflows.

NVIDIA DOCK Memosa new DOCA framework powering BlueField-4 storage, enables a dedicated KV cache storage process to increase inference performance fivefold while improving the energy efficiency of general-purpose storage architectures. The result is a POD-level context that delivers faster multi-turn interactions with AI agents, more scalable AI services, and more overall infrastructure usage.

The rack Ethernet NVIDIA Spectrum-6 SPX Ethernet is designed to speed up east-west traffic in AI factories. Configurable with Spectrum-X Ethernet or NVIDIA Quantum-X8000 InfiniBand switches, it offers large-scale rack connectivity with low latency and high performance.

NVIDIA has also launched the DSX platform for Vera Rubin with more than 200 data center infrastructure partners. This includes DSX Max-! to enable dynamic power provisioning across the AI factory. On the other hand, the new DSX Flex software allows AI factories to be flexible assets on the electrical grid thanks to the release of 100 gigawatts of energy from the grid.

Also released today is the Vera Rubin DSX AI Factory Reference Design, a co-designed AI infrastructure model that maximizes tokens per watt and overall performance. It tightly integrates compute, networking, storage, power, and cooling, improving energy efficiency and enabling AI factories to reliably scale high-intensity, continuous workloads with maximum uptime.

Vera Rubin-based products will be available from partners from the second half of 2026. Among them are the main cloud providers (AWS, Google Cloud, Azure and Oracle Cloud Infrastructure) and different NVIDIA Cloud partners: CoreWeave, Crusoe, Lambda, Nebiud, Nscal and Together AI. Everything also indicates that Cisco, Dell HPE, Lenovo or Supermicro will offer servers based on Vera Rubin.