Interview

Hebbia founder processed all 80,000 JFK files overnight — here's what the AI found

Mar 19, 2025 with George Sivulka

Key Points

  • Hebbia founder George Sivulka processed all 80,000 JFK files overnight using his document intelligence platform, finding the CIA, Fidel Castro, and FBI as the most frequently mentioned entities after Kennedy himself.
  • Hebbia's core differentiation is an orchestration layer that deploys multiple smaller models with full attention per document rather than chasing longer context windows, making it resistant to data poisoning attacks.
  • Sivulka argues consumer AI agents will be solved by OpenAI within a year and startups cannot compete because they lack the partnerships and negotiating power to access airline and hotel APIs.
Hebbia founder processed all 80,000 JFK files overnight — here's what the AI found

Summary

George Sivulka, founder of Hebbia AI, processed all 80,000 pages of the newly released JFK files overnight using his company's document intelligence platform — the same system it normally deploys on virtual data rooms and financial due diligence.

The findings are more suggestive than conclusive. Extracting named entities across every document, Saula found that the third most-mentioned entity after JFK himself is Central Intelligence — the CIA — followed by Fidel Castro, with at least 18 documents devoted substantially to Castro, and then the FBI. There are multiple mentions of additional shooters. Martin Luther King Jr. appears frequently as well. A cluster of apparent UFO references turned out to be OCR errors on degraded scans. Gary Underhill, a CIA officer whom some conspiracy researchers believe was killed to suppress his claim that the agency was behind the assassination, also surfaces in the files.

How Hebbia processed the corpus

The files arrived as messy, decades-old scanned documents — not clean text. Hebbia ran them through a tiered preprocessing pipeline that mixes multimodal models for the hardest images, standard OCR, and layout detection before feeding output to language models. The core architecture is not a single large context window but what Sivulka calls an infinite effective context window: a god model orchestrating smaller sub-models that read every file with full attention individually, then synthesize across them. He argues this makes the system more resistant to data poisoning than a conventional RAG search engine, which can be gamed by flooding a corpus with off-topic documents to inflate entity frequencies.

Sivulka is clear that Hebbia's product differentiation lives in that orchestration layer, not in racing for the longest context window. The goal is building the agent orchestration platform that a hypothetical AGI with infinite resources would choose to use — one that delegates structured computation to appropriate tools rather than trying to jam everything into a single model.

Consumer agents and why startups shouldn't bother

Asked about the consumer AI agent moment — the long-promised flow of booking flights, hotels, and itineraries automatically — Sivulka is blunt: it will be solved by OpenAI, probably within the next year, and it is already the worst space for a startup. The bottleneck is not technology but the anti-bot infrastructure airlines and platforms have built, and the partnership agreements needed to get around it. A startup cannot walk into Delta and negotiate API access the way OpenAI can. Founders still rushing to build B2B AI agents face the same problem from the other direction — by the time the whole market is trying to do something, Sivulka argues, it is already too late.

Company background

Sivulka left a Stanford PhD in meta-learning in 2020 after OpenAI's GPT-3 paper effectively rendered his research direction obsolete, reading it as a signal that large language models would dominate. He spent the next two years building Hebbia with almost no customers before the AI boom created the market the product was designed for. The company is at hebbia.com.