Commentary

OpenAI deep research impresses Box CEO but highlights the challenge of routing users to the right model

Feb 4, 2025

Key Points

Box CEO Aaron Levie says OpenAI's deep research product performs research tasks in minutes that previously took a dozen hours, but works best on broad topics with abundant online sources.
Users must manually choose between deep research, o1 pro, and GPT-4 for different queries, a friction point that should be solved by intelligent routing logic at the model layer rather than requiring user cognition.
OpenAI's competitive advantage is shifting from model innovation to product-layer orchestration: task-specific agents that sit atop foundation models and intelligently route queries to the right tool.

Summary

OpenAI's Deep Research Product Impresses Box CEO but Highlights Model Routing Friction

Aaron Levie, CEO of Box, endorses OpenAI's newly launched deep research product, calling it "quite strong" and noting it completed research tasks in minutes that previously took a dozen hours. The implication, he suggests, is profound for knowledge work once AI agents can perform full tasks autonomously and return finished results.

But the product surfaces a friction point that may define the near-term competitive landscape: users still have to decide which model to use for which task, and that routing decision isn't automated.

Levie's team tested deep research on two projects this week. For Mary Meeker research—aggregating high-quality sources from across the internet and organizing them into a linear narrative—the tool performed well. For Luke Ferritor background research, it performed much weaker, primarily because less high-quality source material exists online about him. The experience highlights a constraint: deep research works best on broad topics with abundant indexed information, and degrades sharply when source density thins.

The routing problem cuts deeper. Users currently maintain multiple tabs and have to manually decide whether deep research, o1 pro, or GPT-4 is appropriate for a given prompt. The ideal flow, from a user perspective, would be a single text box that intelligently routes based on the query itself—directing you to deep research for open-ended research tasks, o3 for reasoning-heavy problems, or o3 mini for simpler queries. That logic could live in the model layer rather than in user cognition.

This mirrors a pattern already visible in open-source infrastructure. Platforms like Open Router let developers specify a desired outcome (lowest cost, fastest latency) rather than a specific model, and load balancers route dynamically. The abstraction exists; it hasn't yet been exposed in OpenAI's consumer-facing products.

Levie and others note that the problem feels obvious enough to assume it's already in development. Fast-moving founders with strong operators—the profile that describes Sam Altman and OpenAI's product team—typically move at velocity once a pattern becomes a meme. The current friction is likely a temporary gap between product launch and the abstraction layer that will follow.

The deeper signal: OpenAI is innovating at the product layer, not just the model layer. With general-purpose models showing signs of commoditization and open-source alternatives gaining ground, the company's leverage increasingly sits in routing, task orchestration, and application design. Deep research is an early example—the first of what will likely be many task-specific agents sitting atop a foundation model.

You might also like...

OpenAI in talks to acquire Windsurf for $3B

Apr 17, 2025

Anthropic cuts off Windsurf's Claude access overnight — AI Engineer World's Fair organizer Shawn Wang explains what it means for the OpenAI acquisition rumors

Jun 4, 2025

Claude Sonnet 4 expands context window to 1 million tokens, a 5x increase on Anthropic's API

Aug 7, 2025