Can AI pick better founders than VCs? Oxford's VC Bench study says yes — but the debate is nuanced
Sep 19, 2025
Key Points
- Oxford's VC Bench study claims large language models reach 80% accuracy predicting founder success versus 23% for tier-one VCs, but the anonymized dataset may contain enough pattern signals that models simply recognize founders without names attached.
- Venture capital's core value lies in accessing whisper networks and contextual information never in public training data, not pattern matching alone, making AI's predictive edge on historical data structurally irrelevant to real deal-making.
- LLMs function as consensus machines that would have incorrectly predicted against defense tech adoption in 2017 based on public sentiment, leaving them systematically blind to contrarian bets and market discontinuities.
Summary
Oxford's VC Bench study claims large language models outperform venture capitalists at predicting founder success, with DeepSeek Chat reaching 80% accuracy compared to tier-one VCs at 23%. Andre Horowitz recently argued on the A16Z podcast that venture capital may be one of the last professions AI cannot displace, precisely because it requires living, breathing human judgment. The Oxford paper appears to directly challenge that thesis.
Measurement problems
The benchmark uses 9,000 anonymized founder profiles labeled as success or failure based on whether founders exited, went public, or raised over $500 million. The anonymization itself becomes suspect. It is unclear whether the profiles contain enough signal to truly measure what makes a founder great while remaining sufficiently disguised that an LLM cannot simply recognize patterns—like identifying someone as Mark Zuckerberg without the name attached and predicting success accordingly.
Pattern matching is not the VC job
The real work in venture capital is twofold: identifying great founders and winning allocation at the specific moment when capital matters most. Finding the next Mark Zuckerberg is harder than finding Mark Zuckerberg. Paul Graham met a founder who literally resembled Zuckerberg, attended Harvard, dropped out, and checked every visible box. Graham wrote the check. It was one of his worst investments. Pattern matching fails when it matters most.
Venture decisions rest on secrets—information that never reaches public training data. Whisper networks, personal relationships, and the unwritten context of why a founder will succeed in a specific moment are not facts on the internet. They exist in business partnerships and inside knowledge. An LLM, by definition, cannot access what it was never trained on.
LLMs are consensus machines, not contrarians
In 2017, a model asked whether bright people in Silicon Valley would embrace defense tech in seven years would have predicted no, because public sentiment and Google engineer protests were the loudest signals in the training data. The actual trend reversed. The model would have been systematically wrong precisely because it reflects average opinion rather than predicting discontinuities.
The darker read
Maybe the average LLM performance here simply reflects that the current VC industry itself operates at the level of the average Redditor deploying capital. In that case, beating tier-one VCs at 23% is not a victory for AI, but an indictment of modern venture capital.