Commentary

AI model race: open-weight leaderboard shifts, Gemini 3 rumors, and whether progress has stalled

Oct 14, 2025

Key Points

  • AI progress continues at pace with benchmark scores improving and time-horizon capabilities doubling roughly every six months, though saturation on easy benchmarks makes marginal gains harder to measure publicly.
  • Suno reaches $150M ARR two years after launch with 4x year-over-year growth, signaling real consumer and prosumer demand for AI music generation despite a narrow total addressable market.
  • Google's Arc AGI research showing tiny recursive models outperforming larger alternatives suggests new architectures could fragment the market and create standalone businesses outside direct competition with Anthropic and OpenAI.

Summary

Tyler argues that AI progress continues at pace despite the appearance of stalling. Benchmarks show clear improvement, and time-horizon capabilities are doubling roughly every six months. The gap between easy benchmarks such as math and facts versus hard ones like complex reasoning has narrowed so much that marginal gains become hard to measure publicly.

Gemini 3 rumors are circulating, with unverified sources on X claiming it is "incredible." Vedant Misra at DeepMind posted vaguely that researchers are "doing all kinds of stuff with these models that the public isn't even thinking of yet." Tyler hopes Gemini 3 trades raw capability for speed and cost efficiency—a model that is expensive to run but qualitatively better than Claude 4.5 or GPT-5. GPT-5 prioritized efficiency and drove down inference costs, which may be strategically sound but less visibly impressive.

Safe Superintelligence trades at a $32 billion valuation despite having no customers, products, or public plans. Only 30% of listed AI startups will hold or exceed their current valuations in five years. Databricks and Midjourney stand out as durable businesses. Databricks has built a robust enterprise SaaS business since 2013 and accelerated in the AI era. Midjourney remains 99% profitable without raising capital, so a downturn simply reduces margin rather than threatening survival.

Suno, the AI music generation platform, hit $150M ARR two years after launch, up 4x year-over-year. The revenue likely comes from AI early adopters paying $20 per month for personal use, plus business customers paying $200 per month for branded or looping audio. Consumer and prosumer demand appears real even if the total addressable market remains narrow.

Google's paper on Arc AGI featured a tiny recursive model that outperformed larger alternatives on the benchmark. New architectures beyond scaled LLMs could fragment the market. Midjourney and Anthropic coexist because they serve different modalities. Similarly, a model good only at Arc AGI-style puzzles might find its own niche and become a standalone business without directly competing with Anthropic or OpenAI.