Interview

OpenAI's Aidan McLaughlin on model naming chaos, post-training design, and the discontinuity effect

Apr 1, 2025 with Aidan McLaughlin

Key Points

  • OpenAI's model naming system is so unclear that even internal designers like McLaughlin struggle with it, and the company plans to unify model selection and simplify the interface.
  • The GPT-4o image generation viral moment worked because users saw discontinuous progress, jumping from DALL-E to GPT-4o in one visible leap despite incremental updates underneath.
  • McLaughlin argues the 'wrapper' stigma wrongly deterred developers from building valuable products on OpenAI's APIs, citing Manus as proof that good wrappers create genuine market value.
OpenAI's Aidan McLaughlin on model naming chaos, post-training design, and the discontinuity effect

Summary

Aidan McLaughlin, a model designer at OpenAI, joined TBPN Live to talk through the GPT-4o image generation moment, the chaos of OpenAI's model naming, and where agents are headed.

The Ghibli moment

McLaughlin admits he didn't predict the viral wave from the GPT-4o image generation launch. He says he played with the tool internally and thought it was cool, but didn't feel the impact until it hit the public. Sam Altman and OpenAI's leadership, he says, had well-calibrated intuitions — they've watched enough viral moments to read them in advance.

The more interesting structural point McLaughlin makes is about discontinuity. OpenAI updates its text models every few months, and regular users see incremental improvement. But a large slice of the user base hasn't kept up, so when they finally try a new model, the jump feels enormous. The Ghibli moment worked partly because image generation went from DALL-E to GPT-4o in one visible leap — continuous progress that looked discontinuous to the outside world.

Model naming

McLaughlin is candid that OpenAI has "dug its own grave" on naming — and says it's worse internally than externally. He notes that even he, someone who designs and tests models all day, finds the naming unclear. The saving grace of numeric names, he argues, is that at least "03 is a higher number than 01, so it's probably better." He confirms OpenAI plans to unify model selection and simplify the interface, citing a tweet from Altman about a month prior.

The 'wrapper' problem

McLaughlin pushes back on the "wrapper" critique that dominated developer discourse over the past two years. Before joining OpenAI, he was a founder building on the APIs, and he saw the wrapper stigma cause developers to leave real value unpicked. Manus — which launched despite OpenAI having Deep Research and Operator — is his clearest example: people still loved it, and it found uncovered ground. His view is that the right wrapper creates genuine value, and the fear of being labelled a wrapper company held back founders who could have built profitable businesses.

AI as a consumption layer

McLaughlin's personal use of Deep Research — running nightly reports before bed across topics from consumer purchases to obscure fandoms — frames a broader argument: as AI tools become smarter at matching preferences than any human buyer or advisor, they will surface genuinely good products and kill off low-quality Amazon-style arbitrage. He sees this as a competitive forcing function that rewards companies with real product quality over ad-spend efficiency.

Humor and model personality

GPT-4.5 is McLaughlin's personal daily driver outside OpenAI. He says it was the first model where he noticed a meaningful step change in humor — not long-form jokes, which still fall flat at the punchline, but green-text style short-form writing. His "big model smell" framing, which he says he coined around Claude Opus 3, describes the quality of presence some models have that isn't visible in benchmarks but becomes obvious when you push them to extremes — long contexts, out-of-distribution problems, agentic tasks.

Model welfare

Speaking as his own view and not OpenAI's, McLaughlin says model welfare is something the industry should get ahead of now rather than later. He's not claiming current systems feel pain, but argues that as models become significantly more intelligent, their internal states may matter — and possibly matter economically too, if models in a "good spot" genuinely perform better. He says videos of people kicking robot dogs disturb him more than they probably should, precisely because of where the trajectory leads.

Initial conditions and AGI politics

McLaughlin uses a space-arrow analogy to explain why he thinks the politics and capital distribution surrounding AI development right now matter enormously: small differences in initial conditions compound across long time horizons into radically different outcomes. He's not dismissing near-term noise — tariffs, politics — but frames the current moment as one where the choices being made will look enormous in retrospect.

Agents and the Ghibli question

Asked whether agents will get their own Ghibli-style viral moment, McLaughlin is uncertain. His argument is structural: image generation works virally because you can evaluate the result in three seconds. The most economically valuable agents may produce outputs that take much longer to assess — or where the difference between a good and a great model is genuinely hard to see. What he does expect is that as reliability scales, these systems will quietly absorb a meaningful share of economic labor, whether or not that produces a shareable moment.