Research Finder
Find by Keyword
SpaceX, Anthropic, and the Inference Wall Needs All of the Above
How the Colossus 1 lease and the Groq absorption reframe the ASIC contest and reinforce NVIDIA's substrate position and quietly validates the full stack thesis
05/11/2026
Key Highlights
- NVIDIA is reframing inference from a cost line item into a continuous revenue workload, with the company estimating that compute demand has risen roughly one million fold across the past two years.
- Anthropic's May 2026 agreement to take all of SpaceX's Colossus 1 capacity (more than 220,000 NVIDIA GPUs and over 300 megawatts) is live evidence that the inference wall is binding now and that NVIDIA silicon remains the absorbing substrate regardless of which lab is buying.
- The Groq acquisition, closed in December 2025 for approximately $20 billion, is the largest deal in NVIDIA history and is designed to absorb the one workload GPUs have historically handled inefficiently, which is low latency decode.
- Vera Rubin is architected for agentic workloads at rack scale, delivering roughly 3.6 exaflops per NVL72 and pairing with Groq 3 LPUs through the Dynamo scheduler on a per token basis.
- Jensen Huang has raised the Blackwell plus Rubin revenue opportunity to roughly $1 trillion across 2025 through 2027, doubled from the prior $500 billion figure shared earlier in the cycle.
The News
NVIDIA has repeatedly positioned AI inference as the defining workload of the next compute cycle and calls it the largest wall the industry has ever encountered. The framing was reinforced at GTC 2026 and across recent investor communications. The supporting moves include the Groq acquisition closed in late 2025, the Dynamo distributed inference platform that orchestrates prefill and decode across heterogeneous accelerators, the Vera Rubin platform engineered for agentic continuous workloads at rack scale, and a revised $1 trillion revenue opportunity figure for Blackwell and Rubin demand across 2025 through 2027.
In the same window, Anthropic announced on May 6, 2026 an agreement with SpaceX to take all of the compute capacity at the Colossus 1 data center in Memphis. The deal delivers more than 300 megawatts and over 220,000 NVIDIA GPUs to Anthropic within the month, drawing from a fleet that includes H100, H200, and GB200 accelerators. The deal lands ahead of SpaceX's planned June IPO. Anthropic and SpaceX also expressed mutual interest in multiple gigawatts of orbital AI compute capacity. NVIDIA is pushing measurement away from peak FLOPS and toward tokens per watt and tokens per dollar as the operating metrics that matter inside an AI Factory.
Analyst Take
We read "the largest wall ever" as a deliberate piece of category creation by NVIDIA. But they are not wrong to do so. We think it is the most important strategic frame the company has put forward since the AI Factory pitch. The phrase is lifting several narratives at the same time. It diagnoses a workload regime change. It sizes a TAM. It justifies the largest acquisition in company history. And it redirects the only credible competitive narrative against NVIDIA in this cycle, which is the custom silicon performance per dollar story.
Start with the workload. Training was always episodic. A model is trained once and deployed many times, which made training spend lumpy and capacity planning forgiving. Inference is different in kind. Every reasoning chain, every agent loop, every multimodal generation, every embodied AI sensor read draws inference. The token is the unit of work and the unit of revenue. Per NVIDIA's own framing, compute demand has risen roughly one million fold over two years when both model complexity and usage are taken into account, with inference volume alone up on the order of 10,000x and the climb running generative to reasoning at roughly one hundred times and reasoning to agentic at another hundred times. We flag those multipliers as company sourced. They are the company's own composite of model complexity, query growth, and reasoning depth, and we have not seen an independent reconstruction that confirms the exact magnitude. We do not need to defend the precise number to accept the shape of the curve. The curve is steep, it is continuous rather than bursty, and the directional evidence in customer behavior supports it.
That continuity is the architectural argument. A continuous workload exposes every inefficiency in a system. It punishes underprovisioned interconnect. It punishes thermal designs that depend on idle time. It punishes orchestration software that cannot route heterogeneous work. NVIDIA's response is to engineer for that condition explicitly. NVL72 is rack scale by design rather than by aggregation. Vera Rubin is architected to deliver roughly 3.6 exaflops per rack and is paired with hot water cooling at 45 degrees Celsius to support sustained operation without overbuilt facility infrastructure. Power smoothing and load management feature prominently in the new platform messaging. None of this is incidental. It is what designing for continuous inference looks like.
Now the Groq move. We have been arguing in prior notes that the one defensible criticism of the GPU monoculture in inference is the decode problem. LLM inference splits into a compute heavy prefill phase, which suits a GPU, and a memory bandwidth and latency dominated decode phase, where raw GPU compute sits underutilized. Specialized engines from Cerebras and the original Groq landed on that point for years and built credible counter narratives. The approximately $20 billion December 2025 acquisition (it is more complex than that term suggests but its the common media framing) does two things at once. It removes the strongest standalone challenger from the field. And it inserts an SRAM dense low latency decoder directly inside the NVIDIA stack as a specialist colleague to the GPU. The Dynamo scheduler routes prefill to Rubin and decode to Groq 3 LPUs on a per token basis. CUDA runs unchanged. From the operator view, this is a transparent acceleration of the workload phase that previously favored alternative silicon. From the competition view, this is the Mellanox playbook applied to inference. NVIDIA absorbs the specialist, keeps the API surface, and converts a flank attack into a feature of the platform.
The deal happening as we write this note is the cleanest real time validation of the wall thesis we have seen this cycle. Anthropic announced on May 6, 2026 that it has signed an agreement with SpaceX to absorb the entire compute capacity at the Colossus 1 data center in Memphis. The deal delivers more than 300 megawatts and over 220,000 NVIDIA GPUs to Anthropic within a month, drawing from a fleet that includes H100, H200, and GB200 accelerators. Colossus was built originally to power xAI's Grok models. Industry reporting has placed xAI's model FLOPS utilization on the cluster in the low double digits, well below the benchmark common at rival labs. The capacity was there, the workload was not. In classic Musk fashion, the bold move puts the cluster in full play in days rather than weeks.
That gap is the wall in microcosm. A 220,000 GPU cluster sat underutilized at one frontier lab. Another frontier lab needed the capacity badly enough to take all of it, and the immediate use case was inference rather than training. Anthropic explicitly tied the new capacity to higher rate limits for Claude Code and the Claude Opus API, doubling the five hour Claude Code rate limits for paid tiers and meaningfully raising API throughput for the Opus model line. This is the agentic inference workload exposing itself in production. Coding agents loop. They burn tokens. They are the largest face of the wall that NVIDIA has been describing for two years, and they are now setting service tier economics at a frontier lab in close to real time.
Two angles in this deal matter directly for NVIDIA. The first is substrate. Colossus 1 is end-to-end NVIDIA silicon. Anthropic has been aggressive in diversifying its training stack across Amazon Trainium, Google TPUs through a reported up to one million unit agreement, and Azure capacity running on NVIDIA AI systems through a roughly $30 billion Microsoft partnership announced in November 2025. The company has also flagged a roughly $50 billion U.S. infrastructure agreement with Fluidstack and a longer dated Google and Broadcom commitment coming online in 2027. But when Anthropic needed inference capacity to ship inside thirty days, the only path that cleared was NVIDIA silicon sitting in someone else's data center. That is the substrate point we have been making in prior notes, and validates the neocloud thesis on a grand scale. Diversification at the training layer does not relieve dependence on NVIDIA at the serving layer, because the existing global inventory of inference ready capacity is overwhelmingly NVIDIA based and is the fastest path to production.
The second angle is orbital. SpaceX's statement around the deal noted that the compute required to train and operate the next generation of frontier systems is outpacing what terrestrial power, land, permitting, and cooling can deliver on the timelines that matter. Anthropic and SpaceX both expressed interest in multiple gigawatts of orbital AI compute capacity as part of the same announcement. We have argued in previous notes covering the orbital data center thesis that the inference wall is the demand driver that makes orbital compute economically conceivable. SpaceX Starship coming online is the supply side driver. The Colossus deal is the most concrete commercial signal we have seen in that direction. It does not commit anyone to orbit. It does signal that the buyers of frontier capacity are now openly modeling a world where terrestrial constraints become binding inside this decade, and it puts that conversation on the same press release as a 300 megawatt terrestrial deal.
This is what we mean by the wall outlasting custom silicon. The TPU and Trainium narratives are real, and the migrations are real. Anthropic's TPU commitments, Meta's MTIA roadmap, and Google's TPU v7 momentum are not noise. But each of those competitors optimizes a single face of the wall. NVIDIA's argument is that a workload defined by reasoning, agentic loops, long context, video, embodied AI, and real time orchestration has too many faces for an ASIC optimized to one face to win the whole problem. We think that argument has merit, and we think it will be sustained at least through the Rubin cycle. The Groq integration covers the latency face. NVL72 and Dynamo cover the scale and disaggregation face. CUDA covers the developer face. The annual cadence covers the obsolescence face. The full stack is a complete climbing system rather than a single rope.
Looking Ahead
We hold the bull case with guardrails. Three things should keep analysts honest. First, the $1 trillion revenue opportunity is a Huang forecast, not a backlog. The historic precedent of his upward revisions suggests the floor is defensible, but the timing risk lives outside NVIDIAs control in power, interconnect, and advanced packaging supply rather than in demand. Second, hyperscaler vertical integration pressure is structural. Even if NVIDIA wins the workload technically, customer concentration in a small number of buyers who also build their own silicon caps the upside on price. Third, the wall framing only works if token economics keep improving. If cost per useful token plateaus, the Jevons effect that turns cheaper tokens into more tokens consumed stalls, and the demand curve flattens. We do not expect that across the next two cycles, but we will track it as the leading indicator.
The strategic punchline. Going into (another) GTC Taipei in early June, NVIDIA is selling a narrative that the largest workload in computing history demands the most complete climbing system in computing history. The Groq acquisition is the proof point that NVIDIA will pay strategic premiums to remove the sharpest critique of its architecture. The Rubin platform is the proof point that the company is willing to redesign the rack to match the workload. The Colossus deal is the proof point that the inference wall is binding now, that NVIDIA remains the substrate of choice regardless of which lab is buying, and that the orbital compute conversation is no longer hypothetical. We are bullish into the Rubin ramp. We are watchful on power, packaging, and customer concentration. And we believe the inference wall, framed correctly, is the most asymmetric pricing argument the company has put forward. This wall must be climbed, and everything to date across NVIDIA and custom silicon are cases of ‘more’ not ‘instead of.’
Stephen Sopko | Analyst-in-Residence – Semiconductors & Deep Tech
Stephen Sopko is an Analyst-in-Residence specializing in semiconductors and the deep technologies powering today’s innovation ecosystem. With decades of executive experience spanning Fortune 100, government, and startups, he provides actionable insights by connecting market trends and cutting-edge technologies to business outcomes.
Stephen’s expertise in analyzing the entire buyer’s journey, from technology acquisition to implementation, was refined during his tenure as co-founder and COO of Palisade Compliance, where he helped Fortune 500 clients optimize technology investments. His ability to identify opportunities at the intersection of semiconductors, emerging technologies, and enterprise needs makes him a sought-after advisor to stakeholders navigating complex decisions.