Auren Hoffman: data markets are smaller than everyone thinks, and charging founders for VC legal fees is wrong
May 23, 2025 with Auren Hoffman
Key Points
- The data market for AI model training is vastly smaller than the current boom suggests, with a buyer pool of roughly a dozen frontier labs and most valuable data assets kept proprietary rather than sold.
- VC legal fee caps charged to portfolio companies mask cost from LPs while incentivizing lawyers to fill the allowed budget, typically landing at 99 percent of the cap regardless of actual work required.
- Flex Capital closed a seed round in six days by consolidating legal negotiations into a single four-hour session, suggesting speed improves when fee structures eliminate incentives for prolonged back-and-forth.
Summary
Auren Hoffman — who runs a data-company holding company and seed fund Flex Capital, targeting roughly 100 deals per year — makes two arguments worth taking seriously: the data market is far smaller than the AI boom implies, and standard VC legal fee clauses are quietly extracting value from founders and LPs alike.
Data markets: a reality check
Hoffman's thesis on data ten years ago was that the number of institutional buyers would grow significantly — hedge funds from 50 to 500, retailers in proportion. It didn't happen. The hedge fund buyer pool moved from roughly 50 to about 58. Even the largest funds in the world, like Bridgewater, spend almost nothing on proprietary data relative to AUM. Real estate went from zero serious data buyers to roughly one — Blackstone.
The deeper problem is that the most valuable data assets aren't for sale. Google's YouTube advantage in video generation is real precisely because Google won't sell it. When Reddit licensed its data to Google for $60 million to train LLMs, Hoffman argues the brand was doing much of the work — the same conversations from an unknown platform might have fetched $5–10 million.
For founders pitching a data-generation business aimed at AI model companies, the buyer pool is even thinner: a maximum of around a dozen frontier labs. Selling exclusively to OpenAI, Anthropic, Google, and Meta is not a durable business. Hoffman is backing at least one robotics data company, but the logic only holds if the data has buyers beyond the core model builders.
VC legal fees: a hidden LP problem
The standard term sheet practice of requiring a portfolio company to cover the lead investor's legal fees — often capped anywhere from $10,000 to $300,000 — strikes Hoffman as neither founder-friendly nor LP-friendly, despite being marketed as the latter.
The LP-friendly framing doesn't hold up. When a VC passes operating costs to the fund directly, LPs see the line item and can push back. Legal fees charged to the portfolio company disappear from the LP's view entirely. The company absorbs the cost, the VC avoids scrutiny, and the LP never knows.
Hoffman's practical observation: if the cap is $50,000, the bill will come in at $49,000. Almost every time. Lawyers fill the space they're given because the incentive structure points that way. Negotiating the cap down — as Immad Akhund of Mercury apparently does by agreeing on a ceiling with counsel upfront — is step one, but Hoffman's preference is to push it to zero.
He also links fee size to deal velocity. A higher legal fee cap is a reliable predictor of a slower close, because lawyers have more room to work. His rule of thumb: 15 days to close, plus one day for every $2,000 in the cap.
The alternative is already proven. Flex Capital recently closed a seed round in six days by getting all parties in a room for a four-hour session to work through every open issue live. Hoffman notes that M&A transactions — often considered more complex — regularly close faster than VC financings, because everyone is motivated to finish. The YC SAFE is the structural version of the same logic: no meaningful legal costs, no back-and-forth, no cap to fill.