Google has unveiled its seventh-generation Tensor Processing Unit (TPU), Ironwood, at Google Cloud Next 25. Ironwood is Google’s most performant and scalable custom AI accelerator to date and the first TPU designed specifically for inference workloads.
Google emphasizes that Ironwood is designed to power what they call the “age of inference,” marking a shift from responsive AI models to proactive models that generate insights and interpretations. The company states that AI agents will use Ironwood to retrieve and generate data, delivering insights and answers.
A respondent in a Reddit thread on the announcement said:
Google has a huge advantage over OpenAI because it already has the infrastructure to do things like making its own chips. Currently, it looks like Google is running away with the game.
Ironwood scales up to 9,216 liquid-cooled chips, connected with Inter-Chip Interconnect (ICI) networking, and is a key component of Google Cloud’s AI Hypercomputer architecture. Developers can leverage Google’s own Pathways software stack to utilize the combined computing power of tens of thousands of Ironwood TPUs.
The company states, “Ironwood is our most powerful, capable, and energy-efficient TPU yet. And it’s purpose-built to power thinking, inferential AI models at scale.”
Furthermore, the company highlights that Ironwood is designed to manage the computation and communication demands of large language models (LLMs), mixture of experts (MoEs), and advanced reasoning tasks. Ironwood minimizes data movement and latency on-chip and uses a low-latency, high-bandwidth ICI network for coordinated communication at scale.
Ironwood will be available for Google Cloud customers in 256-chip and 9,216-chip configurations. The company claims that a 9,216-chip Ironwood pod delivers more than 24x the compute power of the El Capitan supercomputer, with 42.5 Exaflops compared to El Capitan’s 1.7 Exaflops per pod. Each Ironwood chip boasts a peak compute of 4,614 TFLOPS.
Ironwood also features an enhanced SparseCore, a specialized accelerator for processing ultra-large embeddings, expanding its applicability beyond traditional AI domains to finance and science.
Other key features of Ironwood include:
- 2x improvement in power efficiency compared to the previous generation, Trillium.
- 192 GB of high-bandwidth memory (HBM) per chip, 6x that of Trillium.
- 1.2 TBps bidirectional ICI bandwidth, 1.5x that of Trillium.
- 7.37 TB/s of HBM bandwidth per chip, 4.5x that of Trillium.
(Source: Google blog post)
Regarding the last feature, a respondent on another Reddit thread commented:
Tera? Terabytes? 7.4 Terabytes? And I’m over here praying that AMD gives us a Strix variant with at least 500GB of bandwidth in the next year or two…
While NVIDIA remains a dominant player in the AI accelerator market, a respondent in another Reddit thread commented:
I don’t think it will affect Nvidia much, but Google is going to be able to serve their AI at much lower cost than the competition because they are more vertically integrated, and that is pretty much already happening.
In addition, in yet another Reddit thread, a correspondent commented:
The specs are pretty absurd. Shame Google won’t sell these chips, a lot of large companies need their own hardware, but Google only offers cloud services with the hardware. Feels like this is the future, though, when somebody starts cranking out these kinds of chips for sale.
And finally, Davit tweeted:
Google just revealed Ironwood TPU v7 at Cloud Next, and nobody’s talking about the massive potential here: If Google wanted, they could spin out TPUs as a separate business and become NVIDIA’s biggest competitor overnight.
These chips are that good. The arms race in AI silicon is intensifying, but few recognize how powerful Google’s position actually is. While everyone focuses on NVIDIA’s dominance, Google has quietly built chip infrastructure that could reshape the entire AI hardware market if it decides to go all-in.
Google states that Ironwood provides increased computation power, memory capacity, ICI networking advancements, and reliability. These advancements, combined with improved power efficiency, will enable customers to handle demanding training and serving workloads with high performance and low latency. Google also notes that leading models like Gemini 2.5 and AlphaFold run on TPUs.
The announcement also highlighted that Google DeepMind has been using AI to aid in the design process for TPUs. An AI method called AlphaChip been used to accelerate and optimize chip design, resulting in what Google describes as “superhuman chip layouts” used in the last three generations of Google’s TPUs.
Earlier, Google reported that AlphaChip had also been used to design other chips across Alphabet, such as Google Axion Processors, and had been adopted by companies like MediaTek to accelerate their chip development. Google believes that AlphaChip has the potential to optimize every stage of the chip design cycle and transform chip design for custom hardware.