Hugging Face CEO Clément Delangue on hitting 10M developers, open-source AI's strategic importance, and why video datasets are the fastest-growing category
Jun 24, 2025 with Clément Delangue
Key Points
- Hugging Face crossed 10 million registered developers as of June 24, 2025, with a new model, dataset, or app created every 10 seconds on the platform.
- Video datasets are the fastest-growing category on Hugging Face, driven by demand for video model training and synthetic data credible enough for robotics physics simulation.
- US AI labs duplicate expensive training costs independently while Chinese labs like DeepSeek mutualize expenses through open release, creating a structural competitive disadvantage for US companies.
Summary
Hugging Face crossed 10 million registered AI developers as of June 24, 2025, a milestone CEO Clément Delangue announced on the day of the interview. The platform now sees a new model, dataset, or app created every 10 seconds, and hosts nearly half a million open datasets with 1,000 new ones added daily. Monetization follows a freemium structure, with enterprise clients such as Google and Nvidia paying for user management, security tooling, and premium compute access.
Video Datasets Are the Fastest-Growing Category on the Platform
Video datasets are outpacing all other categories on Hugging Face, driven by two forces: growing demand for video model training and an accelerating volume of synthetic video data whose physics simulations are now credible enough to be useful. Delangue flags robotics as a second-order beneficiary, with synthetic video increasingly used as a physics-grounded training environment for robot systems.
On what makes a strong video dataset today, Delangue is direct: size matters most at this stage of the cycle. Quality optimization follows later, mirroring the arc seen in text, where frontier labs have quietly shifted toward smaller, more specialized models even as they stop publicly disclosing parameter counts.
Open Source as a Strategic Lever, Not Just an Ideology
Delangue argues that US AI leadership between 2016 and 2022 was built on open science, citing Google's release of the transformer architecture as the foundation that eventually became ChatGPT. That openness has since contracted among large US labs, creating what he describes as a structural inefficiency: Anthropic, OpenAI, and xAI each run near-identical, expensive training runs independently, duplicating compute and energy spend. Chinese labs, led by DeepSeek, mutualize those costs through open release. Delangue frames this not as an ideological gap but a competitive one with national security implications.
The Anthropic fair-use ruling on training with approximately 7 million books is read as a positive signal, particularly for open-source releases, which Delangue believes are almost categorically fair use given their educational and non-commercial nature. He hopes the ruling removes a legal excuse some US companies have used to avoid open-sourcing models and datasets.
MCP Integration Targets AI Model Building, Not Just App Development
Hugging Face released an MCP server last week integrated into ChatGPT's Codex, Cursor, and related coding tools. The explicit aim is to let developers use those interfaces to train and optimize AI models directly, not just build conventional software applications. Delangue sees this as a potential flywheel where agentic coding tools lower the barrier to model creation, broadening the base of AI builders beyond the current pool of several hundred elite researchers.
On-Device Inference and Robotics as the Next Inflection Points
Delangue expects on-device AI inference to eventually capture a larger share of total compute than on-device software did historically, citing speed, privacy, and near-zero marginal cost to users as the drivers. He notes the current base is effectively zero percent on-device for AI inference, making any meaningful shift a substantial market structural change.
In robotics, Hugging Face hosted what Delangue describes as the largest open-source robotics hackathon to date, drawing participants across more than 100 locations. He sees cheap hardware, open-source software, and improved AI capabilities converging toward a potential ChatGPT-equivalent moment for the category, though he is openly uncertain whether the winning form factor will be general-purpose humanoids or a proliferation of task-specific robots.
Where Delangue Sees the Real Opportunity
Delangue is candid that text and chatbot improvements have become incrementally boring from a research perspective, even while they remain commercially important. His attention has shifted to biology and chemistry. He specifically highlights Arc Institute's cell perturbation prediction model, released on Hugging Face and GitHub, as an example of AI with high-leverage real-world applications in drug design. When asked what Meta and Scale AI should prioritize producing and open-sourcing, his answer is biology and chemistry datasets, which he views as severely underbuilt relative to their potential impact.