Interview

Chonkie: Open-source document chunking for LLM RAG pipelines — 180K downloads, 200 projects, and LlamaIndex dependency

Jun 11, 2025 with Trey

Key Points

  • Chonkie, a two-person YC S25 startup, has reached 180,000 downloads and secured 200+ dependent projects by solving a critical problem: retrieval accuracy on multi-document corpuses collapses without structured chunking.
  • An eval showed o3 achieved 75% retrieval accuracy on dense documents without chunking, jumping to 100% after processing through Chonkie.
  • LlamaIndex uses Chonkie as a core dependency, and at least 10 to 12 current YC companies are already adopting it as the ingestion layer for LLM RAG pipelines.
Chonkie: Open-source document chunking for LLM RAG pipelines — 180K downloads, 200 projects, and LlamaIndex dependency

Summary

Chonkie, a two-person YC S25 company founded by Trey and a childhood friend, builds open-source document chunking infrastructure for LLM retrieval-augmented generation pipelines. Dumping thousands of PDFs into a large context window works for a single document but breaks down at scale. Frontier models struggle with retrieval accuracy on dense, multi-format documents without structured chunking.

Trey ran an eval the day before Demo Day, after OpenAI's o3 price drop. He fed classic literature into o3 and asked pointed retrieval questions. Without chunking, o3 hit 75% retrieval accuracy. After running the same documents through Chonkie, accuracy reached 100%.

The product sits at the ingestion layer of an LLM stack. Developers feed documents in, get embeddings out, and can route those into their own vector database or let Chonkie wrap around one. Integration takes two to five lines of code. For static corpora, chunking runs asynchronously at ingest. For live use cases like code generation, it runs in real time.

Chonkie started as a side project in February 2025 before entering YC. It has reached 180,000 downloads and supports 200 projects depending on it. The project is a core dependency of LlamaIndex. Ten to twelve companies from the current YC batch are already using it. Trey expected to close a funding round by Friday after Demo Day.