News

AI token costs are rising, not falling — advanced reasoning models blow up startup bills

Sep 2, 2025

Key Points

  • Notion CEO Ivan Zhao disclosed gross margins compressed 10 percentage points in two years to 80%, driven by costs paid to AI model providers for new features.
  • Advanced reasoning models like OpenAI's o1 carry substantially higher token costs than base models, forcing API-dependent startups to absorb inference expenses directly without cost arbitrage.
  • Notion's infrastructure choice of Turbopuffer vector database helps mitigate margin pressure, making the underlying infrastructure layer as critical as model selection when AI features erode profitability.

Summary

Advanced AI reasoning models are crushing software company margins. Ivan Zhao, CEO of Notion, disclosed that his company's gross margins have eroded by roughly 10 percentage points in the past two years, dropping from around 90% to 80%, due to costs paid to AI providers underpinning Notion's latest features. Cloud software typically maintains gross margins above 85%, making this compression visible and material.

The margin hit matters only if AI features fail to drive offsetting revenue gains. If AI-powered capabilities accelerate customer acquisition, reduce churn, or increase willingness to pay, the economics can still work. Whether features materially change customer behavior or retention will determine whether the margin compression is sustainable.

Notion runs on Turbopuffer, a vector database, and according to Mickey Liu, Notion's data engineering lead, that choice lets the company deliver AI features to customers at lower cost. The infrastructure layer matters as much as the model layer when margin pressure is this acute.

Advanced reasoning models like OpenAI's o1 carry substantially higher token prices than base models. As models become more capable at reasoning and long-context tasks, inference costs rise. Startups and smaller companies that depend on third-party API calls absorb that cost directly in their product delivery. Unlike giants building their own silicon and models, they have no cost arbitrage to offset rising token prices. At scale, the math becomes harder.