Commentary

Why Nvidia bought Groq: SRAM architectures, inference disaggregation, and the race for ultra-low latency

Jan 5, 2026

Key Points

Nvidia closed its $20 billion Groq acquisition in less than two weeks with no competing bidders, signaling conviction that Groq's SRAM architecture fills a gap in its inference chip portfolio.
Decode latency has become the bottleneck in AI inference, and developers now pay 10x to 100x premiums for meaningfully faster token generation on tasks like code generation and research.
Groq's fab partnership with Samsung or GlobalFoundries reduces Nvidia's foundry concentration risk while securing an SRAM-optimized design that complements its GPU-centric strategy across different workload requirements.

Summary

Jensen Huang closed the Groq acquisition in less than two weeks with no competing bidders, wiring $20 billion and pushing to complete the deal before year-end. The speed reflects conviction about what Groq's architecture adds to Nvidia's chip lineup.

Groq's founder Jonathan Habibi designed TPUs at Google, giving him deep expertise in task-specific chip design. That pedigree shaped the bull case early on, even when the startup's rollout seemed uncertain and expensive. The real strategic logic sits in inference disaggregation.

Decode bottleneck

Inference splits into two phases: prefill processes the full prompt, and decode streams tokens one at a time. Decode is latency-bound and bandwidth-bound, exactly where SRAM architectures excel. Groq's design trades memory capacity for extremely high memory bandwidth, making it ideal for ultra-low latency agentic reasoning workloads. Combined with Nvidia's existing Rubin lineup—CPX for large context windows, standard Rubin for training and batched inference—this creates a portfolio where Nvidia can mix and match chips to optimize the performance-versus-cost tradeoff for each workload.

SRAM-based chips have historically delivered higher token-per-second rates than GPUs, TPUs, or other ASICs, but at much higher cost per token due to smaller batch sizes. The question was whether users would actually pay the premium for speed. Cerberus and Groq's recent results show developers and enterprises are willing to pay significantly more, potentially 10x or 100x, for meaningfully faster inference, especially for economically valuable work like code generation and research. That willingness shifts the Pareto frontier, opening space for specialized ASICs optimized along different performance dimensions.

Habibi does not fab at TSMC; he uses either Samsung or GlobalFoundries. For Nvidia, that diversification reduces foundry concentration risk while securing access to an SRAM-optimized design that complements its existing GPU-centric strategy. The acquisition also returned roughly $4 billion to Social Capital and Chamath Palihapitiya, making it one of venture's largest exits.

Huang has the capital and is deploying it across labs and cloud infrastructure. Keeping Nvidia focused as a pure-play chip company rather than building competing cloud services or data center assets makes the Groq acquisition feel like a natural extension of existing strength. The bet is on fragmenting inference into workload-specific architectures, and Groq gives Nvidia the SRAM piece of that puzzle.

You might also like...

Groq CEO Jonathan Ross: LPUs beat GPUs on inference economics, new European data center live in under a month

Jul 8, 2025

Broadcom surges 10% on OpenAI chip partnership; AMD closes gap on Nvidia in SemiAnalysis Inference Max benchmark

Oct 13, 2025

Cerebras CEO Andrew Feldman: NVIDIA spent $20B to buy the #2 inference player — validating our market

Jan 12, 2026