Interview

Groq CEO Jonathan Ross: LPUs beat GPUs on inference economics, new European data center live in under a month

Jul 8, 2025 with Jonathan Ross

Key Points

  • Groq's LPUs cut electricity cost per token to roughly one-third of GPU equivalents by eliminating external memory entirely, enabling capex payback in two to four years.
  • Groq deployed its first European data center in Helsinki in under one month, beating its previous 51-day Saudi Arabia deployment by removing external switches and liquid cooling.
  • Groq CEO Jonathan Ross says most hyperscaler and startup chip efforts will be cancelled like two of Google's three AI chip projects, as software parity with Nvidia is table stakes before competing on silicon.
Groq CEO Jonathan Ross: LPUs beat GPUs on inference economics, new European data center live in under a month

Summary

Jonathan Ross, CEO of Groq, makes a pointed case that the company's Language Processing Units (LPUs) beat GPUs on inference economics by eliminating external memory entirely. Rather than waiting on slow memory reads, Groq's architecture keeps everything active simultaneously, cutting electricity cost per token to roughly one-third of GPU equivalents. That lower opex floor means every workload Groq runs clears its operating costs, with capex payback periods ranging from two to four years depending on the model mix — a more aggressive amortization target than most GPU-based cloud operators apply.

Deployment Speed as a Differentiator

Groq stood up its first European data center in Helsinki in under one month after deciding at an AI event to pursue the deployment. The previous benchmark was 51 days to deploy roughly 20,000 chips in Saudi Arabia. Ross attributes the speed to architectural simplicity: no external network switches in the core interconnect, air cooling instead of liquid cooling, and a deliberate choice to stay a generation behind on infrastructure technology to avoid debugging novel systems during rollout. The reliability dividend is material — a recent 4,000-chip deployment ran its target model successfully on the first attempt, something Ross says would be impossible with 4,000 GPUs given typical field failure rates.

Software Stack Is the Real Moat

Ross frames the custom silicon race as fundamentally a software problem first. Features like prefix caching and speculative decoding are table stakes; without them, no chip can compete with the latest GPU regardless of hardware quality. Groq spent its first six months building a compiler to reach software parity with Nvidia before competing on silicon. Ross draws on his own history — he originated Google's TPU as a 20% project — to note that of three AI chip efforts at Google, two were cancelled. He expects most current hyperscaler and startup chip efforts to face the same outcome.

On architectural philosophy, Ross argues against model-specific chip designs. When mixture-of-experts architectures emerged, any chip optimized for prior model structures became obsolete. Groq's bet is maximum generality, a position Ross says is validated each time a new model paradigm appears.

Inference Economics Drive Customer Behavior

A common adoption pattern Ross describes: developers prototype on frontier proprietary models like Claude or GPT, then migrate production workloads to open-source models on Groq once a cheaper, faster option proves sufficient for the task. Several older, technically inferior models remain in production at scale simply because customers have locked in working systems and will not touch them.

Speed translates directly to revenue for customers. Ross cites data showing a 100-millisecond improvement in response time lifts engagement or conversion rates by roughly 8% on desktop and over 30% on mobile. Lower token costs rarely drive customers to spend less — they typically reinvest the savings into more tokens to run more parallel or iterative workloads.

As a concrete example, Groq engineers found that Qwen 32B running on LPUs solved formal math theorems faster in wall-clock time than Claude Opus, even though Opus required fewer iterations, purely because Groq's throughput allowed faster iteration cycles.

Sovereign AI Demand Is a Supply Problem, Not an Ideology Problem

On international expansion, Ross argues sovereign AI demand is driven less by data-sovereignty politics and more by simple GPU scarcity. Nvidia and AMD are already building every GPU their supply chains allow, constrained by the same external memory components Groq avoids. Countries and enterprises that cannot get GPU allocation are turning to Groq as an available alternative. In Saudi Arabia, customers had been waiting over a year for GPU orders with no confirmed delivery timeline before switching to Groq. Ross frames the dynamic as analogous to electrification — nations that lack access to compute risk falling behind in workforce productivity the same way they would without electricity.

Groq plans to add data centers across additional continents in the second half of 2025, with at least one undisclosed location in progress. Ross also signaled new silicon or systems arriving later this year, described only as representing significant advancement. If OpenAI releases an open-source model, Ross confirmed Groq would launch it immediately.