The Silent War Over AI Inference: Why Nvidia Spent $20 Billion and OpenAI Is Betting Bigger
Two nearly identical $20 billion deals announced within months of each other signal a seismic shift in artificial intelligence computing. In December 2025, Nvidia acquired Groq, a startup specializing in inference chips. In April 2026, OpenAI announced a $20 billion purchase from Cerebras, another inference chip maker, which simultaneously filed for an initial public offering (IPO) with a $35 billion valuation. These aren't separate events; they're two sides of the same battle for control over AI inference, the computing workload that's about to become the most profitable segment of the entire AI industry.
Why Did the AI Industry Suddenly Care About Inference?
For years, the conversation around AI computing focused on training, the process of teaching a model by feeding it massive amounts of data. But the economics have flipped dramatically. Training is a one-time cost; inference is an ongoing cost. GPT-4 was trained once, but it answers questions from hundreds of millions of users every single day. Each conversation is an inference request, and at scale, the cumulative cost of inference far exceeds training.
Market research from Deloitte and CES 2026 shows that inference already accounted for 50 percent of all AI computing spending in 2025, and by 2026, that proportion jumped to two-thirds. Lenovo's CEO Yang Yuanqing put it bluntly at CES: the structure of AI spending will reverse from "80 percent training plus 20 percent inference" to "20 percent training plus 80 percent inference." This shift means the most profitable part of the AI industry is moving from training chips to inference chips, and these two types of chips require drastically different architectural designs.
What's Wrong With Nvidia's GPUs for Inference?
Nvidia's H100 and H200 processors are monsters designed for training. Their core advantage is extremely high computational throughput; training requires massive multiplication operations on huge matrices, and GPUs excel at this kind of parallel computing. However, the bottleneck for inference is not computation, but memory bandwidth.
When a user submits a question, the chip needs to move the weights of the entire model from memory to the computing unit before it can generate an answer. This "moving" process is the real source of inference latency. Nvidia's GPUs use external high-bandwidth memory (HBM), and this moving step inevitably introduces latency. For ChatGPT, which processes tens of millions of requests per second, this latency multiplied by scale becomes a real performance bottleneck. When OpenAI's internal engineers were optimizing Codex, a code generation tool, they discovered that no matter how they tuned the parameters, response speed was limited by the architecture of Nvidia GPUs. In other words, Nvidia's disadvantage in inference isn't a matter of effort; it's a matter of architecture.
How Do Specialized Inference Chips Solve This Problem?
Cerebras and Groq took completely different approaches to chip design. Cerebras' WSE-3 chip is so large it required wafer-level packaging, measuring 46,255 square millimeters, larger than a human hand. It integrated 900,000 AI cores and 44 gigabytes of ultra-high-speed SRAM (static random-access memory) onto a single silicon chip. The key innovation: memory was placed directly next to the computing cores, reducing the "transportation" distance from centimeters to micrometers. The result was inference speeds 15 to 20 times faster than Nvidia's H100.
Groq's LPU (Language Processing Unit) followed a similar philosophy, using SRAM architecture optimized for inference. At the time of Nvidia's acquisition, Groq's LPU was the world's fastest inference chip service in public reviews. These specialized chips proved that the inference problem wasn't unsolvable; it just required a different architectural approach than what Nvidia had built.
Why Did Nvidia Spend $20 Billion to Acquire Groq?
On December 24, 2025, Nvidia announced its largest acquisition in history. The $20 billion price tag for Groq was triple the size of Nvidia's previous largest acquisition, the $7 billion purchase of Mellanox in 2019. The message behind this money was far more important than the amount itself: Nvidia acknowledged it had a structural gap in inference capabilities, and that gap was large enough to warrant spending $20 billion to plug it.
If Nvidia truly believed its GPUs were unbeatable in inference, it wouldn't have needed to acquire Groq. This acquisition was essentially a $20 billion technology purchase order, acknowledging the real technological advantage of SRAM embedded architecture in inference scenarios and admitting that Nvidia's existing product line couldn't naturally cover this advantage. Nvidia's official narrative after the acquisition was different, framing it as "deep integration with Groq to provide a more complete inference solution," but the technical translation was clear: Nvidia realized its own solutions weren't enough, so it bought someone else's.
What Makes OpenAI's Deal With Cerebras Different?
OpenAI's approach to the inference problem was fundamentally different from Nvidia's. Rather than acquiring a competitor, OpenAI decided to diversify its chip suppliers and secure long-term access to specialized inference hardware. In January 2026, OpenAI and Cerebras signed a three-year computing power purchase agreement worth $10 billion. However, the full details revealed on April 17, 2026, showed this was far more complex than a simple hardware purchase.
The procurement amount doubled from $10 billion to $20 billion. More importantly, OpenAI will acquire warrants in Cerebras, and as the scale of procurement increases, its shareholding can reach up to 10 percent of Cerebras' total share capital. The deal also includes a $1 billion working capital loan to help Cerebras scale up its manufacturing operations. This structure gives OpenAI both immediate access to inference capacity and a financial stake in Cerebras' success, creating a partnership rather than a simple vendor relationship.
The initial install is for 750 megawatts of Cerebras systems to be installed through 2028, with options for another 3 gigawatts of gear in 2029 and 2030. The deal has the potential to scale to $20 billion over multiple years, according to Cerebras' latest IPO filing.
How Is Cerebras Positioning Itself for the IPO?
Cerebras Systems filed to go public back in September 2024 with a $4 billion valuation, but quietly pulled the plug on that initial attempt when private equity companies started lining up to give the company large funding rounds. The company's Series G funding round came in at $1.1 billion in October 2025, pushing its valuation to $8.1 billion. Another $1 billion in Series H funding came in February 2026, driving the company's valuation to $23 billion. Now, filing for IPO again with a $35 billion valuation, Cerebras is attempting to capitalize on the inference boom.
The company's financial picture has transformed dramatically. In 2024, 85 percent of Cerebras' revenue was driven by a single customer: Group 42, an Arabic AI model maker backed by the government of the United Arab Emirates. However, by 2025, the biggest revenue driver was a new customer, the Mohamed bin Zayed University of Artificial Intelligence, also located in Abu Dhabi. These two customers accounted for 86 percent of the $510 million that Cerebras brought in last year. The OpenAI and Amazon Web Services (AWS) deals represent a dramatic diversification of its customer base.
In March 2026, Cerebras inked a "binding term sheet" with AWS to integrate CS-3 systems with AWS's homegrown Trainium 3 and future Trainium 4 AI systems, characterized as a multi-year deal "to bring fast inference to an even bigger scale through global distribution". The prospective deal also includes a warrant for 2.7 million shares of Cerebras, which depending on the valuation could be worth significantly more.
What Does This War Mean for the Broader AI Industry?
The inference battle reveals a fundamental truth: Nvidia's dominance in AI chips is not absolute. The company's architecture, optimized for training, has real limitations for inference at scale. By acquiring Groq, Nvidia is essentially admitting it needs to evolve beyond its traditional GPU approach. By investing heavily in Cerebras, OpenAI is hedging its bets and ensuring it won't be entirely dependent on Nvidia for the computing workload that will soon dominate AI economics.
The stakes are enormous. Inference is expected to account for two-thirds of AI compute spending by 2026, and that proportion will only grow as AI models become more widely deployed. Control over inference infrastructure means control over the most profitable segment of the AI industry. Nvidia's $20 billion acquisition and OpenAI's $20 billion investment are not separate events; they're two competing strategies for dominating a market that is almost certain to become the largest tech market in history.
How to Understand the Inference Chip Market's Key Players
- Nvidia's Position: Dominant in training chips but facing architectural limitations in inference; acquired Groq for $20 billion to add specialized inference capabilities to its product portfolio.
- Cerebras' Strategy: Building wafer-scale chips with integrated SRAM that deliver 15 to 20 times faster inference than Nvidia's H100; securing major deals with OpenAI and AWS while preparing for IPO.
- OpenAI's Approach: Diversifying away from sole reliance on Nvidia by investing $20 billion in Cerebras while also developing its own ASIC chips for autonomy applications.
- AWS' Role: Integrating Cerebras inference chips with its own Trainium processors to offer customers faster inference at scale through global distribution.
- SambaNova and Graphcore: Other competitors in the inference space; SambaNova remains independent while Graphcore was acquired by Arm/SoftBank in July 2024.
Cerebras has $3.34 billion in liquidity as of early 2026, including the $1 billion working capital loan from OpenAI and $1 billion from the Series H funding round. This capital gives the company the resources to begin tackling the massive OpenAI buildout while also attracting other customers who may not buy gigawatts of capacity but will add up to a proper customer pyramid over time.
The inference chip market is no longer a niche segment; it's the battleground where the future of AI computing will be decided. The two $20 billion deals announced in late 2025 and early 2026 are not coincidences. They're confirmations that the industry has fundamentally shifted, and the companies that control inference infrastructure will shape the AI economy for decades to come.