Commentary

OpenAI's Studio Ghibli image generation goes viral, dominating the timeline

Mar 26, 2025

Key Points

  • OpenAI launched pixel-by-pixel image generation in ChatGPT on March 26, enabling style transfer and iterative editing where users upload photos and request conversions like Studio Ghibli transformations.
  • Google released similar autoregressive technology in Gemini 2.0 Flash two weeks earlier but locked it behind APIs and experimental portals, ceding viral momentum to OpenAI's consumer-facing product.
  • The feature signals a threshold in AI usability: users now trust these tools to execute tasks reliably enough to feel like hiring employees, reshaping how AI is experienced in daily workflows.

Summary

OpenAI integrated image generation directly into ChatGPT on March 26, making it available across free, plus, pro, and team tiers. The feature uses an autoregressive architecture that generates images pixel-by-pixel rather than the diffusion-based approach of DALL-E 3, which started from noise and iteratively refined it.

Autoregressive generation excels at iterating on existing images and style transfer. Users can upload a photo and request a Studio Ghibli conversion—a request that went viral on social platforms—and the model produces a coherent stylized version in a single pass. Previous diffusion models required extensive prompt engineering and often failed at text rendering and fine-grained edits. One-shot generations now succeed regularly, including text-heavy images, and characters remain consistent across iterations.

Ben Thompson frames this as a meaningful shift in what AI feels capable of. Images in ChatGPT demonstrates the autoregressive approach's strength in day-to-day workflows where control matters more than original generation. Diffusion remains better for blank-canvas creation, but ChatGPT solved a friction point in editing. Users could previously request a small change—"move the tree from the left mountain to the right"—and the model would regenerate the entire image instead of making the surgical edit. The new approach allows iterative refinement while maintaining character and composition integrity.

Google released Gemini 2.0 Flash native image generation two weeks prior, also autoregressive, and it handles image editing well. Gemini 2.0 Flash Image Generation Experimental is only available in the Gemini API and Google AI Studio, not in the main Gemini product. More critically, it does not support style transfer. When tested with a Studio Ghibli prompt, it generated an entirely new image from scratch rather than applying the style to the reference photo. OpenAI shipped the product that consumers could actually use; Google shipped the technology.

Viral momentum has been substantial. Studio Ghibli anime conversions of public figures, sports moments, and everyday photos flooded social platforms. Saquon Barkley's Super Bowl ad was converted flawlessly. The model correctly reads embedded text. On the Ramp ad, it identified "world's best boss" on a snow globe that was subtle in the original and rendered it with even greater clarity. Depth-of-field handling is sophisticated: when encountering blurred backgrounds, the model infers what should be there and renders it coherently rather than leaving artifacts or floating objects.

Content moderation appears inconsistent. The model declined to stylize images of children but readily created versions of public figures and adult users. Copyright restrictions are present but erratic—it blocked a Winnie the Pooh conversion but allowed Donald Trump. Behavior varies across devices and attempts.

Practical applications are immediate. E-commerce sellers can photograph a product, drop it into a contextual scene, and get photo-realistic lifestyle shots. A founder could spend 10 minutes creating a personalized children's book by photographing pages from an existing book, swapping faces onto characters, and printing it. One commenter estimated this approach could reach $100k in near-term revenue if marketed to grandparents on Facebook.

Thompson argues this feature signals a shift in how AI is experienced. Deep Research, inference-time scaling with o3, and Images in ChatGPT all cross a threshold: these are tools you can hand a task and trust to complete it reliably. He calls this the "ammunition" definition of AGI—AI given a task and trusted to execute it well enough. This contrasts with a "rifle sight," where the user aims and fires. Deep Research at $200 per month feels like hiring a new employee. Images in ChatGPT, by extension, feels like hiring a designer.

The constraint today is not capability but distribution. OpenAI solved both: the feature is embedded in the product billions use daily. Google built superior technology and locked it behind APIs and experimental portals, ensuring only researchers and power users discovered it. That distribution choice explains why the meme economy exploded around ChatGPT, not Gemini.