Jeff Huber on context rot, RAG vs. long-context windows, and AI adoption curves
Jul 14, 2025 with Jeff Huber
Key Points
- Chroma CEO Jeff Huber argues large context windows degrade in quality well before advertised limits, with reasoning performance dropping as early as 10,000 tokens, making retrieval-augmented generation durable for production workloads.
- Huber frames context engineering as a two-loop system: real-time retrieval decisions paired with feedback loops that route user signals and operational data back into embeddings and model weights.
- Open source models will proliferate as Chinese labs ship competitive weights, compressing margins and making frontier proprietary models unnecessary for most business and consumer applications.
Summary
Jeff Huber, CEO of Chroma, makes the case that large context windows are oversold for production AI applications. Chroma published a technical report on what Huber calls 'context rot,' finding that model reasoning performance can degrade significantly before the advertised context limit is reached, with drop-offs appearing as early as 10,000 tokens. His preference as a builder is a model with a reliable 60,000-token window over one claiming 10 million tokens that 'kind of maybe sort of works sometimes.'
RAG vs. Long Context
The report directly challenges the narrative that retrieval-augmented generation (RAG) would be made obsolete by expanding context windows. Huber argues that for now, retrieval remains a superior pattern for most production workloads. Three approaches are gaining traction among developers building at scale:
- Retrieval with reranking to surface only the most relevant content
- Breaking large context into many parallel small LLM calls, exploiting the inverse relationship between context size and reasoning quality
- Chroma's own vector database as the retrieval backbone
The quadratic compute cost of scaling attention means reaching a billion-token context window is not a thousandx engineering challenge over one million tokens, it is orders of magnitude harder, making retrieval-based architectures a durable pattern rather than a transitional one.
Context Engineering as a Discipline
Huber frames the emerging practice around a two-loop model. The inner loop is the real-time problem of deciding what belongs in the context window for a given task. The outer loop is designing a system that improves at the inner loop over time by capturing user signals, agent feedback, and operational data, then routing that learning back into the retrieval layer, the embedding model, rerankers, or model weights directly. Chroma introduced this framing publicly at a conference held at Ramp's New York office roughly a week before this conversation.
This connects to Andrej Karpathy's recent writing on reinforcement learning as a mechanism for baking hard-won operational lessons into model behavior rather than re-stuffing them into context on every call. Huber views the RAG layer and RL-driven weight updates as complementary rather than competing paths.
AI Adoption and Market Structure
On enterprise AI spend, Huber describes the current buyer base as heavily weighted toward process automation, with conservative procurement behavior that mirrors legacy IT buying patterns. His framing is direct: enterprises default to whoever becomes the IBM of AI, the vendor no one gets fired for choosing. He sees that position as still contested, with OpenAI holding the brand frontier, Google Cloud and Gemini offering scalable infrastructure, and Microsoft benefiting from model-agnostic positioning.
On the coding tool layer, Huber argues that owning the top of the funnel with a direct end-user relationship is durable across market structures, which explains why model labs are pursuing IDE plays even when they are confident in their models. He characterizes Cognition's acquisition of Windsurf as a logical move, buying a functioning distribution and user base, the 'DVD business,' while building toward a longer-term agentic coding platform.
Open Source Trajectory
Huber expects open source models to proliferate, driven partly by Chinese labs continuing to ship competitive weights. He cited the Kimi model released over the weekend as generating genuine developer interest on benchmarks. His view is that margin compression is structurally inevitable and that for a large share of business and consumer use cases, the intelligence bar is simply not high enough to require frontier proprietary models. The capability already embedded in existing weights, in his words, represents 'a couple billion dollars' of unlocked value waiting to be applied.