Interview

Fastino raises $17.5M from Khosla to train sub-billion-parameter AI models on consumer GPUs

May 9, 2025 with George Maloney

Key Points

  • Fastino raises $17.5M from Khosla Ventures to commercialize task-specific language models trained on consumer GPUs for under $100K, targeting enterprises drowning in large-model API costs.
  • The startup's sub-billion-parameter models sacrifice generality for precision on narrow tasks like text-to-SQL and JSON parsing, with inference fast enough for latency-sensitive workloads.
  • Founders George and Ash built Fastino after watching AI API bills balloon past headcount costs across portfolio companies, betting that frontier models won't solve the cost-latency-accuracy tradeoff for enterprises.
Fastino raises $17.5M from Khosla to train sub-billion-parameter AI models on consumer GPUs

Summary

Fastino, a startup training sub-billion-parameter AI models on consumer gaming GPUs, has raised $17.5 million from Khosla Ventures. The company was founded by George and his co-founder Ash, whose origin story runs directly through the problem Fastino is trying to solve: while running a Cursor-style developer agent in 2023, Ash found that LLM API costs exceeded his headcount costs. George, then investing as a GP after selling a previous company, saw the same pattern across his portfolio.

The core thesis is that GPT-4 and Gemini are consumer products trained on trillions of data points to handle open-ended queries, and that makes them expensive and imprecise for the narrow, high-volume tasks enterprises actually run at scale. Fastino's answer is a family of models it calls Task-Specific Language Models (TLMs) — purpose-built, not fine-tuned or distilled from existing transformer models. George says the architecture is proprietary and undisclosed in detail, but all models sit below one billion parameters, were trained for under $100,000 in GPU costs, and inference in milliseconds.

Accuracy, George argues, improves as the task becomes more tightly scoped. The trade-off is explicit: these are not generalist models. The initial TLM lineup targets developer infrastructure use cases — text to JSON, text to SQL, agentic function calling, document parsing, and PII redaction. That last category is positioned at banks and insurance companies. There's also a profanity-detection model, which George describes as the team's internal favorite and the most entertaining red-teaming exercise.

Deployment is API-first, but the small parameter footprint opens on-premises and CPU-level inference as realistic options for customers who want to avoid round-trips to an external API. George frames this as a meaningful differentiator for latency-sensitive enterprise workloads.

The commercial argument rests on three pressure points that large LLMs don't resolve well for enterprises: cost, latency, and task accuracy. Whether Fastino's architecture can hold that edge as frontier models get cheaper and smaller remains the open question, but the $17.5 million from Khosla suggests at least one major investor thinks the wedge is real.