Cerebras CEO Andrew Feldman: NVIDIA spent $20B to buy the #2 inference player — validating our market
Jan 12, 2026 with Andrew Feldman
Key Points
- NVIDIA's $20 billion acquisition of an inference leader validates that fast inference has become a standalone, high-stakes market category that Cerebras has been building toward since 2016.
- Cerebras solved a 75-year manufacturing problem in 15 months by designing wafer-scale chips with one million identical tiles and 5% redundancy, anticipating thermal and structural failures that delayed competitors.
- Production workloads now drive demand as inference latency becomes culturally visible through viral social media; pharmaceutical research emerges as a high-value vertical where faster inference compresses drug development timelines.
Summary
Andrew Feldman, co-founder and CEO of Cerebras Systems, argues that NVIDIA's $20 billion acquisition of the number-two inference player is the clearest possible validation that fast inference has become a standalone, high-stakes market category — one Cerebras has been building toward since its founding in early 2016.
The Wafer-Scale Bet
Cerebras was built on a single architectural conviction: AI workloads are fundamentally different from general compute, and a chip the size of a dinner plate — not a postage stamp — could deliver 10x to 50x performance gains over conventional designs. The core engineering challenge, unsolved for 75 years, was how to manufacture a large-die chip with acceptable yield. Cerebras cracked it in 15 months for $12 million by borrowing a technique from memory design — building the chip with roughly one million identical tiles, holding approximately 5% in reserve as redundancy, and mapping around defects rather than discarding the die.
The company claims it anticipated and solved, before they became public failures, the coefficient of thermal expansion problem that delayed NVIDIA's B200 by 18 months and the structural issue that caused Tesla's Dojo project to fail. Serial number one shipped to Argonne National Laboratory, and national lab and defense contracts now measure in the hundreds of millions of dollars.
Market Timing and Demand
Feldman draws a clear arc: 2023 was the GPT moment when AI became interesting, 2025 was when it became demonstrably useful, and the current period is characterized by production workloads — not demos — driving massive demand for compute. Cognition is cited as a marquee customer, with the inference latency problem now culturally visible enough to generate viral TikTok content mocking slow competitors.
Beyond coding agents, Feldman points to pharmaceutical research as a high-value vertical. The argument is straightforward: compressing an 18-year drug development process by even one year, when the patent window is only 27 years, is economically significant. Faster inference means researchers can run more simulations before committing to expensive wet-lab validation.
Business Model and Financials
Cerebras raised a $1.1 billion Series G and is deploying capital across manufacturing expansion, additional data center capacity, and international growth. The company offers hardware sales, managed hardware in customer facilities, and a cloud API with token-based pricing. Feldman claims Cerebras has better gross margins than all startup competitors and is growing faster and at greater scale — though no specific figures were provided.
On IP strategy, Cerebras runs a dual-track approach combining aggressive patents on detectable infringements and trade secrets for methods that are difficult to audit externally. Manufacturing is deliberately segmented so that no single supplier sees the full process interaction.
Infrastructure Notes
Cerebras hardware is designed to fit the standard L72 rack footprint, requiring the same power and liquid cooling infrastructure now standard for high-end AI chips. The company has used direct liquid cooling since its early generations, a practice it says Google's TPU team also adopted in 2017 but that hyperscalers like Meta have only recently retrofitted into new facilities.
On the memory market, Feldman flags a familiar cycle: GPU demand has pulled DRAM capacity toward HBM, tightening supply of conventional DRAM, prompting hyperscalers to front-load annual orders, which in turn inflates near-term demand signals well beyond what actual consumption will support.