Interview

OpenRouter co-founder Alex Atallah explains how AI model routing works and which models power users actually choose

Jun 25, 2025 with Alex Atallah

Key Points

  • OpenRouter routes AI requests across multiple cloud providers to work around model exclusivity, with Gemini locked to Google Cloud, Anthropic split between AWS and Google Cloud, and Grok available only on Azure and X.
  • Claude Sonnet 3.7 leads OpenRouter's token consumption rankings this month, followed by Sonnet 4 and Gemini 2.5 Flash, signaling which models power users actually deploy in production.
  • Model fragmentation across hyperscalers is structural and durable, not temporary, creating persistent demand for OpenRouter's routing layer to match requests against provider-specific constraints like context length and tool-calling support.
OpenRouter co-founder Alex Atallah explains how AI model routing works and which models power users actually choose

Summary

OpenRouter, co-founded by Alex Atallah (previously co-founder of OpenSea, the dominant NFT marketplace from the YC Winter 2016 batch), positions itself as a control plane for language models, a routing and orchestration layer it describes as "Stripe meets Cloudflare for AI." The business model is straightforward: users pay OpenRouter directly for token consumption, and the company captures margin by negotiating volume economics with GPU providers on the supply side.

The core value proposition is model availability and reliability. No single cloud carries all frontier models. Gemini is Google Cloud-exclusive. Anthropic runs on AWS and Google Cloud but not Azure. Grok is available on Azure and directly from X but not Google Cloud. OpenRouter tracks this fragmentation and routes traffic across providers, balancing load across two to twenty vendors for any given model. Failover to a different model is rare; the more common operation is routing the same model request to whichever provider can satisfy the specific technical requirements of that call.

The technical complexity of that routing is significant. Providers hosting the same underlying model differ across context length limits, maximum output constraints, tool-calling support, and JSON schema handling. OpenRouter's matching layer filters providers based on what the incoming request actually requires, flagging unsupported combinations before they fail in production.

Kline, a coding agent similar to Cursor that supports all OpenRouter models, is currently one of the highest-traffic apps on the platform. Within Kline's publicly visible usage data, Claude Sonnet 3.7 ranks first by token volume this month, followed by Sonnet 4, then Gemini 2.5 Flash. Atallah argues token volume is the most honest adoption metric because power users, who drive exponentially more token consumption than average users, are effectively voting with both time and money. That data is publicly browsable on OpenRouter's rankings page, filtered by app category, making it a real-time signal of what sophisticated builders are actually running in production rather than what they claim to be evaluating.

On the competitive landscape, OpenRouter does not make direct model recommendations as a rule. Customers who engage directly get workload-specific guidance weighted by budget tolerance and latency requirements. For everyone else, the rankings page surfaces trending models by app type. The broader structural call from Atallah is that cloud-model exclusivity is not a temporary condition. The AI infrastructure war is drawing hard lines around model access, and no single hyperscaler will carry all frontier models. That fragmentation is the durable market condition OpenRouter is built to arbitrage.