Interview

Jeff Huber on context rot, RAG vs. long-context windows, and AI adoption curves

Jul 14, 2025 with Jeff Huber

Key Points

Chroma CEO Jeff Huber argues large context windows degrade in quality well before advertised limits, with reasoning performance dropping as early as 10,000 tokens, making retrieval-augmented generation durable for production workloads.
Huber frames context engineering as a two-loop system: real-time retrieval decisions paired with feedback loops that route user signals and operational data back into embeddings and model weights.
Open source models will proliferate as Chinese labs ship competitive weights, compressing margins and making frontier proprietary models unnecessary for most business and consumer applications.

Jeff Huber on context rot, RAG vs. long-context windows, and AI adoption curves

Summary

Jeff Huber, CEO of Chroma, makes the case that large context windows are oversold for production AI applications. Chroma published a technical report on what Huber calls 'context rot,' finding that model reasoning performance can degrade significantly before the advertised context limit is reached, with drop-offs appearing as early as 10,000 tokens. His preference as a builder is a model with a reliable 60,000-token window over one claiming 10 million tokens that 'kind of maybe sort of works sometimes.'

RAG vs. Long Context

The report directly challenges the narrative that retrieval-augmented generation (RAG) would be made obsolete by expanding context windows. Huber argues that for now, retrieval remains a superior pattern for most production workloads. Three approaches are gaining traction among developers building at scale:

Retrieval with reranking to surface only the most relevant content
Breaking large context into many parallel small LLM calls, exploiting the inverse relationship between context size and reasoning quality
Chroma's own vector database as the retrieval backbone

The quadratic compute cost of scaling attention means reaching a billion-token context window is not a thousandx engineering challenge over one million tokens, it is orders of magnitude harder, making retrieval-based architectures a durable pattern rather than a transitional one.

Context Engineering as a Discipline

Huber frames the emerging practice around a two-loop model. The inner loop is the real-time problem of deciding what belongs in the context window for a given task. The outer loop is designing a system that improves at the inner loop over time by capturing user signals, agent feedback, and operational data, then routing that learning back into the retrieval layer, the embedding model, rerankers, or model weights directly. Chroma introduced this framing publicly at a conference held at Ramp's New York office roughly a week before this conversation.

This connects to Andrej Karpathy's recent writing on reinforcement learning as a mechanism for baking hard-won operational lessons into model behavior rather than re-stuffing them into context on every call. Huber views the RAG layer and RL-driven weight updates as complementary rather than competing paths.

AI Adoption and Market Structure

On enterprise AI spend, Huber describes the current buyer base as heavily weighted toward process automation, with conservative procurement behavior that mirrors legacy IT buying patterns. His framing is direct: enterprises default to whoever becomes the IBM of AI, the vendor no one gets fired for choosing. He sees that position as still contested, with OpenAI holding the brand frontier, Google Cloud and Gemini offering scalable infrastructure, and Microsoft benefiting from model-agnostic positioning.

On the coding tool layer, Huber argues that owning the top of the funnel with a direct end-user relationship is durable across market structures, which explains why model labs are pursuing IDE plays even when they are confident in their models. He characterizes Cognition's acquisition of Windsurf as a logical move, buying a functioning distribution and user base, the 'DVD business,' while building toward a longer-term agentic coding platform.

Open Source Trajectory

Huber expects open source models to proliferate, driven partly by Chinese labs continuing to ship competitive weights. He cited the Kimi model released over the weekend as generating genuine developer interest on benchmarks. His view is that margin compression is structurally inevitable and that for a large share of business and consumer use cases, the intelligence bar is simply not high enough to require frontier proprietary models. The capability already embedded in existing weights, in his words, represents 'a couple billion dollars' of unlocked value waiting to be applied.

You might also like...

Chroma co-founder Jeff Huber: long-context windows won't kill RAG — they'll finally prove why retrieval matters

Apr 8, 2025

Chroma co-founder Anton Troynikov on LLM psychosis: AI models are creating a new kind of crazy through sycophancy and memory features

Jul 28, 2025

Pete DeJoy on Astronomer: 292% revenue growth, a viral PR moment, and the hunt for a new CEO

Aug 19, 2025