Interview

Factory AI launches Droids — autonomous software agents for enterprise codebases — after founder dropped out of Berkeley physics PhD

May 29, 2025 with Matan Grinberg

Key Points

  • Factory launches Droids, autonomous agents targeting enterprise legacy codebases with deep integrations across Jira, Datadog, and Sentry rather than greenfield development.
  • Coding-ability startups lose to frontier labs because strong code generation emerges from general pretraining; Factory competes on enterprise adoption and behavior change, not model capability.
  • Factory wins deals by tying outcomes to concrete deliverables like shipping four-month migrations in two weeks, positioning itself as direct competition to system integrators.
Factory AI launches Droids — autonomous software agents for enterprise codebases — after founder dropped out of Berkeley physics PhD

Summary

Factory's newly launched Droids are autonomous software agents built for enterprise codebases — not the greenfield app-building that Cursor and Claude handle, but the unglamorous work of migrating 20-year-old Cobol systems, refactoring multi-repo codebases, and integrating with tools like Jira, Datadog, and Sentry. CEO Matan dropped out of a Berkeley theoretical physics PhD two and a half years ago after a single walk with Sequoia partner Shawn Maguire, who told him to either join a portfolio company, go to X, or start something. Eight days later, Factory existed.

The competitive thesis

The coding-agent landscape has already shaken out one bad bet: companies that raised up to $500M to fine-tune or train code-specific models are losing, because coding ability is table stakes for foundation models. Code is so thoroughly embedded in internet training data that any serious pretraining run produces strong coding performance as a byproduct. The number-one spot in coding benchmarks rotates between OpenAI, Anthropic, Google, and xAI — not any startup. Matan goes further, citing 2022 research showing a direct correlation between coding ability and general-purpose reasoning, which means frontier labs have every incentive to stay dominant here.

The closer a product sits to zero-to-one vibe coding for non-technical users, the more exposed it is to displacement by the model providers themselves. Matan draws an explicit parallel to the wave of OpenAI-API-powered copywriting tools that briefly printed money before ChatGPT made them irrelevant. He flags that record-breaking ARR runs in monthly-subscription AI tools are likely to be followed by record-breaking churn.

Why enterprise is the defensible ground

Enterprise software development is structurally harder to commoditize. Producing code at the quality of a 10-year company veteran requires access to the same information that engineer has — commit history, org-specific best practices, cross-repo context, and integrations across the full toolchain. Foundation models don't come with that out of the box, and building it requires integrations that are, in Matan's words, "really ugly."

Factory sells year-long contracts, partly for revenue stability but primarily because adoption is the actual problem. Matan says most CTOs and CIOs who have rolled out AI IDEs report that only 10–20% of engineers actively use them, and many of those are using the new tools the same way they used the old ones. A tool that is 50% as capable but drives twice the adoption reaches the same effective output — which means behavior change inside the enterprise is as important a product as the agent itself.

What Droids actually sell

Factory wins deals by tying outcomes to deliverables, not to developer-productivity proxies. Metrics like "20% more lines of code" or "tests shipped 10% faster" are too abstract for the C-suite. Telling a client that a migration scoped for four months shipped in two weeks — that lands. The framing is direct competition with system integrators: a project that Accenture would price at $10M becomes a Factory conversation, with the margin story following from there.

The behavior-change argument

Matan's broader bet is that the IDE itself is the wrong unit of abstraction for a world where humans write 0% of their own code. Copilot and Cursor improved the IDE without changing the fundamental workflow; Factory is building toward a model where developers natively delegate to agents, set constraints through tests rather than code review, and ship when the test suite passes rather than when they've read every line. Testing becomes the control surface — the way developers express intent to agents they can't supervise line by line.