Interview

Dylan Patel launches InferenceMax: the first independent AI hardware benchmark running daily across NVIDIA, AMD, and more

Oct 10, 2025 with Dylan Patel

Key Points

SemiAnalysis founder Dylan Patel launches InferenceMax, a free daily benchmark measuring AI inference costs across NVIDIA, AMD, and other chips, backed by tens of millions in GPU contributions from Microsoft, OpenAI, Oracle, and others.
AMD gains cost-per-dollar advantages on open-source models despite lagging NVIDIA on most dimensions, signaling the era of single-vendor dominance is ending as operators choose based on capital versus power constraints.
GPU depreciation risk threatens neo-cloud operators on short-cycle contracts; NVIDIA's 50% annual performance gains mean six-year-old hardware becomes economically obsolete, while locked-in players like Oracle face only near-term cost timing mismatches.

Dylan Patel launches InferenceMax: the first independent AI hardware benchmark running daily across NVIDIA, AMD, and more

Summary

Dylan Patel, founder of SemiAnalysis, launched InferenceMax on October 9, 2025, positioning it as the first independent AI hardware benchmark running daily across NVIDIA, AMD, and eventually Google TPUs and AWS Trainium. The platform measures cost per million tokens and cost per token per megawatt across major inference frameworks including vLLM, SGLang, and the latest open-source models such as LLaMA and DeepSeek, as well as proprietary models.

What InferenceMax Is Solving

The core problem is vendor cherry-picking. Every chipmaker selects the metric that flatters its hardware, running benchmarks on hyper-optimized, non-representative software stacks. InferenceMax runs on production-grade open-source software, updated daily, to reflect real-world mixed workloads — large requests, small requests, agentic workflows, multiple model types.

The project is backed by NVIDIA, AMD, Microsoft, OpenAI, Oracle, CoreWeave, Dell, Super Micro, and HPE, with cloud vendors contributing tens of millions of dollars worth of GPU capacity. Patel estimates he is personally absorbing over $1 million per year in engineering costs at SemiAnalysis to run the effort, with no direct revenue model confirmed yet.

NVIDIA vs. AMD: More Nuance Than the Market Assumes

The early data challenges the binary NVIDIA-dominates narrative. NVIDIA remains ahead on most dimensions, but AMD shows a cost-per-dollar advantage on specific open-source models, particularly GPT open-source variants that are growing rapidly in enterprise deployment. AMD does not win on a watts basis, making the AMD-vs-NVIDIA decision a function of whether an operator is capital-constrained or power-constrained. Patel frames the OpenAI-AMD deal as an early signal that NVIDIA is no longer the only viable path.

The roadmap includes adding Google TPUs and AWS Trainium, with both companies described as eager participants. Cerebras and Groq are also on the longer-term list. A quality measurement layer is planned alongside throughput metrics, targeting inference optimizations — such as aggressive quantization tricks — that can improve speed at the cost of output accuracy in ways that currently go unmeasured.

GPU Depreciation Risk Is Real for Spec Buyers

The industry standard six-year depreciation schedule for GPUs — used by Google, Microsoft, and Amazon — was established when compute hardware iterated slowly. With NVIDIA releasing new architectures roughly every 18 months at roughly 50% better performance per dollar, Patel argues that in six years a current GPU may be economically obsolete even if it still functions. A six-generation improvement cycle implies roughly a 20x performance gain at perhaps 3x the cost, making older hardware nearly indefensible on a tokens-per-watt or tokens-per-dollar basis.

The risk is unevenly distributed. Operators with long-term contracted customers — Oracle's $300 billion Stargate commitment with OpenAI being the clearest example — are largely insulated because revenue is locked regardless of hardware resale value. The exposed cohort is neo-cloud operators renting on short-cycle contracts of three to six months with no guaranteed renewal. The Hopper GPU market already illustrated this dynamic: early spot pricing reached $3 per unit before falling toward $2 or below as Blackwell supply arrived and competing chips from AMD and Google entered the market.

Oracle's Margin Story Is More Complex Than Reported

Following a sell-off triggered by reporting from The Information, Patel disputes the conclusion that Oracle's cloud margins are structurally thin. The underlying data may be accurate for a specific point in time, but the inference — that Oracle's signed deals carry low margins — is incorrect according to Patel. The actual dynamic is a cost-before-revenue timing mismatch: Oracle is paying for large data center leases, including the Abilene, Texas Stargate facility, before GPU deployments are fully operational and revenue-generating. NVIDIA's GB200 NVL72 has also had reliability issues related to its complexity, liquid cooling requirements, and backplane design, adding to near-term cost pressure without reflecting long-term contract economics.

The Constraint No One Is Talking About Enough

Beyond energy, Patel identifies skilled trade labor as the most underappreciated bottleneck in the AI infrastructure buildout. Electrician and plumber wages are rising sharply as data center construction accelerates, and the Western workforce has no precedent for building at gigawatt scale. Capital, GPUs, and even power can be acquired; the limiting factor may ultimately be the number of qualified workers available to physically deploy the infrastructure.

You might also like...

Dylan Patel on Google's TPU going external: how it's already saving OpenAI 30% on Nvidia spend before deploying a single chip

Dec 1, 2025

Dylan Patel breaks down OpenAI's compute ceiling, China's robotics dominance, and AMD's uphill battle

Jun 6, 2025

Dylan Patel on space data centers, the TSMC bottleneck returning in 2027, and why OpenAI is fine

Feb 3, 2026