Fal co-founder on generative media demand surge: startup customers generating billions in combined revenue
Sep 10, 2025 with Burkay Gur
Key Points
- Fal's startup customers generate billions in combined revenue, signaling real economic activity in consumer and prosumer AI applications rather than demo traffic.
- Each new model release produces discrete traffic jumps that compound without cannibalizing existing workloads, with Nano Banana's edit capability solving video generation's visual consistency problem.
- Fal faces H100 supply constraints and prioritizes uptime over cost, holding GPU capacity overnight even during low-traffic periods to guarantee availability when demand returns.
Summary
Fal's co-founder describes a generative media infrastructure business outpacing most AI infrastructure players on the demand curve. The startup's customers—companies with fewer than 200 employees—generated combined revenue in the billions of dollars, suggesting the consumer and prosumer AI application layer is producing real economic activity beyond demos.
Fal sees demand as additive rather than substitutive. Each new capability release produces a discrete jump in traffic and revenue that compounds over time. The Nano Banana image-editing model illustrates this pattern. Rather than cannibalize existing video workloads, it became a building block for them. Edit models can maintain visual consistency between clips, solving one of video generation's persistent problems and allowing developers to stitch together generated sequences that would otherwise drift. New capabilities unlock new application categories instead of just reshuffling existing ones.
The App Store dynamic following each model launch reinforces how intense consumer demand runs. Within hours of Nano Banana's release, roughly 20 apps appeared in the App Store using the model's name, and nanobanana.ai already claimed the top organic search result. Apple isn't moving fast enough to stop it, and those clone apps are getting traction. Fal reads this as a signal of demand intensity rather than competitive threat.
GPU supply
Fal is currently experiencing an H100 crunch. Unlike training shops that lock in large GPU reservations for months, Fal runs pure inference workloads and scales its fleet dynamically. That means it depends on spot availability across multiple neoclouds, calling providers one by one when it needs to expand capacity. When providers consistently say no across the board, that is the signal to stock up. H100 spot prices hit a floor and have started rising again, a pattern that did not happen with A100s, which declined monotonically.
The Nebius-Microsoft deal announced at roughly $20 billion signals just how capacity-constrained the market is. Nebius would not have ranked as an obvious top-tier infrastructure partner a year ago. That Microsoft chose to commit at that scale points to how little available capacity exists across more obvious options.
Fal prioritizes uptime over cost optimization for now. Even during the overnight trough when US and European users are asleep and Asian traffic is comparatively light, the company holds its GPU pool rather than returning capacity, accepting idle costs to guarantee it can serve customers when demand returns. Moving inference liquidity around the globe by time zone remains a longer-term possibility, not current practice.
The co-founder worked at Oracle from 2012 to 2016, when Larry Ellison was publicly dismissing cloud computing. Oracle's infrastructure credibility in the current AI cycle—built partly through operating Yandex's data centers—reflects the operational discipline that Fal values when choosing infrastructure partners. Providers converting from Bitcoin mining to GPU hosting receive more skepticism.