Rhoda AI emerges from stealth with $450M Series A and a robot foundation model trained on hundreds of millions of videos
Mar 10, 2026 with Jagdeep Singh
Key Points
- Rhoda AI raises a $450M Series A led by Khosla Ventures on the day it exits stealth, an unusually large debut for a company that had never spoken publicly.
- Its foundation model trains on hundreds of millions of internet videos, cutting the teleoperation data needed to fine-tune a new task from tens of thousands of hours to roughly ten.
- Rhoda builds its own hardware alongside the AI, targeting industrial and logistics customers where labor costs make ROI easy to demonstrate.
Summary
Rhoda AI emerged from stealth with a $450M Series A led by Khosla Ventures, an unusually large debut round for a company that had previously said nothing publicly.
Cofounder Jagdeep Singh describes Rhoda as a full-stack robotics company targeting manufacturing and logistics, not consumer hardware.
Training data
Singh's argument is that every major robotics approach so far fails in production for the same underlying reason: the training data is too narrow. Conventional robots follow pre-programmed trajectories and cannot adapt. The newer generation of neural-network-driven robots, typically built on vision-language-action (VLA) models, learns from teleoperation data, with researchers puppeteering robots through tasks using headsets and joysticks. Those datasets are intentionally collected, which makes them small and low-diversity by construction. When the model encounters real-world variability such as different lighting, unfamiliar objects, or unexpected configurations, it fails. Simulation carries the same weakness: no matter how detailed the physics engine, intentionally generated data still misses the long tail of real-world edge cases.
Foundation model
Singh's team, which comes from generative AI and computer vision, treats robotics pre-training the way language and image models are trained: on internet-scale data. Rhoda trained its foundation model on hundreds of millions of video clips, including footage with no obvious connection to manipulation, such as waves on a beach, on the theory that the model is learning generalizable physics from all of it. ChatGPT was not trained only on Shakespeare, and Rhoda's model was not trained only on robot manipulation footage.
Teleoperation still appears in the process, but only at the fine-tuning stage. Singh says Rhoda can align its model to specific tasks with roughly ten hours of teleoperation data, compared to the tens of thousands of hours required by VLA-based approaches.
The model is called the Direct Video Action model. It generates a video prediction of what the robot should do, converts that into a physical action, and runs in closed loop, continuously reabsorbing feedback.
Hardware
Rhoda is building its own robot hardware, not just the AI layer. Singh says no existing hardware met the company's requirements: continuous lifting of 25 kilograms across a full operational day, a three-year reliability target, and linear kinematics that an AI model can accurately simulate. Nonlinear mechanical systems with elasticity or compliance are harder to model, which pushed Rhoda toward what Singh describes as an Apple-like integration of OS and hardware. The model will also be available as an API for customers running third-party hardware, including existing KUKA arms.
Outlook
Singh does not disclose a valuation or customer names. The near-term commercial focus is industrial and logistics settings where manual labor costs are already quantifiable, which Singh views as the lowest-friction path to demonstrating ROI before moving into harder environments.