Interview

Browserbase founder Paul Klein on building the web browser for AI agents

Apr 9, 2025 with Paul Klein

Key Points

  • Browserbase builds a headless browser for AI agents, betting that most of the web will never get structured APIs and agents must navigate unmodified websites to unlock real productivity.
  • The company uses DOM-based routing for simple sites and vision models for complex pages, with Gemini 2.0 Flash emerging as a top performer across recent model evals for browser tasks.
  • Browserbase brought in Okta Ventures and founders of identity platforms Clerk and WorkOS to solve the anti-bot problem by positioning verified user authentication as a legitimate alternative to blocking all automation.
Browserbase founder Paul Klein on building the web browser for AI agents

Summary

Paul Klein, founder of Browserbase, is building what he describes as a web browser purpose-built for AI agents — infrastructure that lets software autonomously navigate websites on a user's behalf. The pitch is straightforward: most real-world work happens inside a browser, and if AI agents are going to do that work, they need a browser optimized for them rather than for humans.

The market framing

Klein's core argument against the skeptics is that browser automation has a very long runway. Most of the internet will never get an MCP server — his example is the Nigerian immigration form — so agents that can only interact via structured APIs will hit a ceiling quickly. The browser, in his framing, is the most primitive and universally compatible MCP server that exists. High-volume, well-capitalized services like Delta will build native integrations; the rest of the web won't, and that's the majority of it.

Browserbase sells primarily to developers on a consumption-based model, meaning revenue scales directly with how much agents actually use the browser. Klein says he can map browser hours on the platform directly to time saved for end users, and that number is climbing every month. Customers range from Perplexity to a 55-year-old dairy trucking company in the American midwest that hired its first engineer specifically to automate fuel-price lookups along delivery routes.

Technology decisions

Browserbase uses both DOM-based and vision-based approaches depending on the site. For complex pages with heavy HTML, vision models make more sense; for simple sites like Craigslist, passing the DOM to an LLM is faster and cheaper. Klein's preference leans toward the DOM approach because it's more cacheable — if a website hasn't changed, the agent can reuse the same button logic rather than sending a fresh screenshot to a model each time. As vision models improve, that cost gap may narrow, but Klein thinks model-level caching on screenshots will need to come from the labs themselves.

The team's open-source framework, Stagehand, abstracts those routing decisions for developers and can slot in any model. When Browserbase ran evals across models over the Llama 4 launch weekend, Gemini 2.0 Flash came out as one of the strongest performers for browser tasks. Llama 4 was more disappointing, though Klein cautions that the test used a third-party inference provider, which can affect results — and Meta hasn't yet released its largest model in the family.

Anti-bot and trust

The abuse question is real but, in Klein's view, solvable at the authentication layer rather than through blunt blocking. Browserbase has already disabled LinkedIn entirely — the spam problem there is too entrenched to touch. The longer-term play is for Browserbase to act as an advocate for legitimate bots, building relationships with platforms that currently block all automated traffic. That's why the company brought in Okta Ventures and the founders of Clerk, Stitch, and WorkOS in its most recent funding round — people who understand identity and authentication deeply. If an agent can log in on a verified user's behalf, rate limits and intent can be matched to a real person, which solves most of what anti-bot systems are actually trying to do.

Human-in-the-loop

Klein is direct that short-term, agents still need supervision. Browserbase offers a live view feature — an embeddable iframe that shows the browser working in real time — so a human can take over if the agent stalls. The near-term product reality he describes is closer to a Devin-style experience where the agent handles 90% of the task and a human intervenes for the last 10%, rather than full autonomy. The pitch to customers is to build that escape hatch into the application design from the start.

Model layer bets

Klein wants more specialization, not a single generalist model that does everything. For browser agents specifically, the architecture he sees working is a reasoning model higher in the agent loop making decisions, with a faster and cheaper model like GPT-4o or Gemini 2.0 Flash handling individual tool calls. More models from more providers also benefits Browserbase directly — customers are switching models frequently, and the more fungible models become, the more valuable stable, model-agnostic infrastructure gets.

Long-context improvements are the capability he's most watching. Longer context windows should improve DOM-based web agents significantly, and he sees that as one of the cleaner remaining unlocks for reliability.

Competitive position

Amazon's Nova SDK borrowed from Stagehand's developer experience, which Klein takes as validation of the interface design rather than a threat. His read on hyperscalers mirrors the PlanetScale-vs-RDS comparison: a focused infrastructure provider can go deeper on a single vertical than a cloud giant building a hundred services at once. The Stripe analogy is the one he reaches for — a company that dominated payments infrastructure and then layered a product platform on top. Browserbase's expansion path runs through the other primitives already inside a browser: password managers, payment credentials, browsing history as agent memory.

On the question of whether agents are priced into the market yet, Klein points to the Klarna story — a company that went all-in on AI agents and then had to walk it back — as evidence that infrastructure maturity, not hype, sets the actual adoption curve. The traction at Browserbase is real, but the agent wave is still early.