Research Finder
Find by Keyword
Can Google’s Ironwood TPU Eclipse Hyperscaler AI Rivals?
Google’s Ironwood TPU seeks to dominate AI inference, but can it surpass other custom silicon entrants in the hyperscaler race?
Key Highlights:
- Ironwood TPU v7p delivers 42.5 exaflops in a 9,216-chip pod, architected for AI inference.
- Enhanced SparseCore and 192 GB HBM3 per chip target reasoning and recommendation tasks.
- The company claims 2x performance-per-watt over Trillium, yet competitors loom large.
- Microsoft’s Maia 100 and AWS’s Trainium challenge on scalability and cost.
- Pathways software aims to orchestrate vast TPU clusters, but integration is unproven.
The News
Ironwood, Google's seventh-generation Tensor Processing Unit (TPU) introduced at Google Cloud Next 25, is according to the company, the most powerful and energy-efficient TPU, designed specifically for inference to support advanced AI models at scale. It features significant performance improvements, including enhanced memory and bandwidth, and a low-latency Inter-Chip Interconnect (ICI) network, enabling efficient handling of complex AI workloads like Large Language Models and Mixture of Experts. Google claims nearly 30x better power efficiency than Google's first Cloud TPU and integration with the Pathways software stack, Ironwood supports the "age of inference," powering proactive AI agents and advanced computational tasks for Google Cloud customers. Find out more here.
Analyst Take
Custom silicon designed for AI, such as Google's Tensor Processing Unit (TPU), AWS's Inferentia, and Trainium, is becoming increasingly prevalent as companies seek to optimize performance and efficiency for AI workloads, and crucially look for alternatives to costly GPUs. Google's TPUs, like the Ironwood model, are tailored for both training and inference, delivering massive computational power and energy efficiency for large-scale AI models. AWS's Inferentia chips are designed to accelerate inference tasks, offering cost-effective, high-throughput processing for deployed machine learning models, while Trainium focuses on high-performance training for complex neural networks. This trend reflects a broader industry shift toward specialized hardware that addresses the unique demands of AI, enabling faster innovation and scalability for cloud providers and their customers.
Against this backdrop, Google’s seventh-generation Tensor Processing Unit, codenamed ‘Ironwood’ was unveiled at Cloud Next 2025. The chip is architected to anchor what the company calls the “age of inference.” The claim of 42.5 exaflops at FP8 from a 9,216-chip pod is audacious. Google claims Ironwood’s performance exceeds El Capitan’s ~1.7 exaflops in raw compute, but differing precisions (FP8 for AI inference vs. FP64 for scientific computing) and workloads make such direct comparisons largely irrelevant.
Ironwood’s prospects against other hyperscaler’s (Microsoft, AWS) custom silicon demands scrutiny. My analysis reveals a chip with formidable strengths but notable challenges in a crowded hyperscaler arena.
Ironwood is a leap forward. Designed to excel in inference, the chip powers “thinking models” like large language models and Mixture of Experts. Each TPU v7p delivers 4.6 petaflops at FP8 precision, or 2.3 petaflops at FP16, with a pod achieving 42.5 exaflops at FP8 and 21.26 exaflops at FP16.
Memory is a standout: 192 GB of HBM3 per chip, six times that of Trillium, with 7.2-7.4 terabits per second of bandwidth - a range driven by likely rounding eros between company publications - a 4.5x increase. The pod’s 1.69 petabytes of HBM and low-latency Inter-Chip Interconnect (ICI) network aim to minimize data bottlenecks, vital for recommendation systems and advanced reasoning. Although, while the ICI’s 1.2 Tbps bandwidth is impressive, bottlenecks can persist due to software, workload characteristics, or network topology.
The SparseCore accelerator, optimized for processing ultra-large embeddings, supports recommendation systems and extends Ironwood’s capabilities to financial applications, such as fraud detection, and scientific tasks, like molecular simulations. While Google highlights SparseCore’s improvements over Trillium, specific performance gains remain undisclosed.
Power efficiency is central. Google claims twice the performance-per-watt of Trillium, achieving 29.3 teraflops/watt—a ~30x improvement over its 2018 TPU. Liquid cooling manages an estimated 700–1,000 W per-chip TDP, addressing AI data centers’ energy demands.
Scalability is Ironwood’s ambition. Pathways software, developed by DeepMind, is designed to orchestrate thousands of TPUs, potentially up to 400,000, for massive machine learning tasks. However, Pathways’ real-world performance at smaller scales is undocumented, posing a risk to its 400,000-chip ambition, therefore such scale remains unproven in public deployments.
The company’s comparison to El Capitan invites skepticism. El Capitan’s FP64 precision serves high-precision scientific computing, while Ironwood’s 42.5 exaflops at FP8 targets AI inference. These are distinct realms. Industry consensus suggests El Capitan’s FP64 focus makes it unsuitable for direct comparison, rendering the company’s performance claims misleading. Ironwood excels in AI tasks but lacks FP64 support, limiting versatility against hybrid CPU-GPU systems. Pathways offers a counterpoint, promising unmatched scalability, but real-world execution is the hurdle.
Competitors are formidable. Microsoft’s Maia 100 (estimated at 1–2 petaflops per chip, based on industry reports) and recently announced Maia 2 integrate tightly with Azure’s software stack, optimizing for LLMs like Copilot. AWS’s Trainium2 (20.8 petaflops per instance, likely multi-chip) and Inferentia2 chips target cost-efficient training and inference, claiming up to 50% lower training costs than EC2 instances via SageMaker’s ecosystem. Nvidia’s Blackwell GPUs (e.g., B200 at ~8 petaflops FP8) set a high bar for single-chip performance, though Ironwood’s pod-scale efficiency targets different workloads. Therefore each rival plays to strengths: Microsoft’s software cohesion, AWS’s cost focus, and NVIDIA’s single-chip performance. While Ironwood’s compute prowess is undeniable, the company’s proprietary AI Hypercomputer ecosystem raises vendor lock-in concerns. Enterprises who favor multi-cloud strategies may hesitate to commit fully to Google’s stack.
Cost is a critical lens. Building an Ironwood pod is estimated at $445 million, with three-year rental costs exceeding $1.1 billion, or $52 per teraflop. AWS’s Trainium, with its claimed up to 50% lower training costs than EC2 instances, may undercut Ironwood for cost-conscious customers though direct pricing comparisons are unavailable. Microsoft’s Maia 100 leverages Azure’s scale, with the potential for resulting price efficiency. Google’s discounts soften the blow, yet total ownership costs remain steep.
Software is Ironwood’s double-edged sword. Pathways aims to deliver seamless orchestration across vast TPU clusters, a potential differentiator. However, while Pathways has been used by Google internally (e.g. Gemini 2.5) integration at scale is untested, and the company’s TensorFlow framework, while robust, faces headwinds. The AI community increasingly favors PyTorch, supported by AWS and Microsoft. Google’s adoption of JAX and PyTorch on TPUs is a strategic nod, but TensorFlow’s proprietary nature may alienate developers. Scaling custom silicon often involves software complexity, a challenge Ironwood must address to achieve widespread enterprise adoption.
The market is unforgiving. AI infrastructure spending is rapidly growing, and while significant investments are projected through 2028 - most projections must be revisited in the evolving geopolitical and market climates. Hyperscalers are doubling data center capacity to meet AI’s computational appetite. The company’s $75 billion 2025 capital expenditure signals commitment, but Microsoft and AWS match this resolve. Ironwood must prove itself in enterprise settings, where efficiency and flexibility reign.
Looking Ahead
Ironwood positions Google for the opportunity to evolve its role as a titan in AI inference, yet execution is paramount. The key trend we are tracking is Pathways’ ability to orchestrate massive TPU clusters reliably. Based on my observations, Ironwood’s raw power is currently unmatched, but the proprietary ecosystem risks repelling multi-cloud adopters. Microsoft’s Maia 100 thrives on Azure’s software synergy, while AWS’s Trainium prioritizes cost efficiency.
The market’s pivot to inference-heavy workloads aligns with Ironwood’s strengths, but adoption hinges on cost and interoperability. When viewing the market broadly, the company’s $75 billion investment underscores a long-term silicon strategy, yet rivals’ more open ecosystems may sway enterprises. HyperFRAME Research will monitor Ironwood’s enterprise performance, particularly how the company balances power efficiency with software complexity, in coming quarters. Another angle is how Google’s focus on TPU will play out in the fracturing geopolitical landscape. Will an in-house capability serve the company well as supply chains for on-premises processing power becomes challenging? It certainly won’t hurt! My perspective is clear: the company must enhance PyTorch support and multi-cloud flexibility to capture a fragmented market, but the early signs are promising
Stephen Sopko | Analyst-in-Residence – Semiconductors & Deep Tech
Stephen Sopko is an Analyst-in-Residence specializing in semiconductors and the deep technologies powering today’s innovation ecosystem. With decades of executive experience spanning Fortune 100, government, and startups, he provides actionable insights by connecting market trends and cutting-edge technologies to business outcomes.
Stephen’s expertise in analyzing the entire buyer’s journey, from technology acquisition to implementation, was refined during his tenure as co-founder and COO of Palisade Compliance, where he helped Fortune 500 clients optimize technology investments. His ability to identify opportunities at the intersection of semiconductors, emerging technologies, and enterprise needs makes him a sought-after advisor to stakeholders navigating complex decisions.