Canopy Labs is training LLMs on movement tokens to build virtual humans indistinguishable from real ones
May 27, 2025 with Elias Fizesan
Key Points
- Canopy Labs trains LLMs on tokenized movement and speech to generate virtual humans capable of realistic micro-gestures and incidental behaviors without explicit programming.
- The startup targets B2B applications in language learning, AI therapy, and tutoring where realistic avatars could improve engagement over voice-only interfaces.
- Canopy's success in creating indistinguishable virtual humans creates a verification problem: the infrastructure to prove human identity in professional settings becomes critical and unresolved.
Summary
Elias, founder of Canopy Labs, is building virtual humans designed to be indistinguishable from real ones on a live video call — the benchmark being a Zoom conversation where you cannot tell whether you're talking to a person or a model.
The technical approach centers on extending the LLM architecture into new modalities rather than using diffusion models. Canopy tokenizes 3D representations of human movement and speech, feeds them through a language model, and has the model output those same token types. The result is a virtual human that learns incidental behaviors — rubbing its nose, picking up a glass of water — without being explicitly programmed to do them. Realistic facial geometry is treated as a solved problem, with Unreal Engine's Metahumans as a usable substrate. The remaining hard problem is movement: lips, hair, hands, and the micro-gestures that push a rendering across the uncanny valley.
Canopy has already open-sourced one model — a voice system that takes text as input and outputs speech tokens, with a second end-to-end variant that processes both text and speech tokens. Movement tokenization is the next extension.
Go-to-market
The initial commercial target is B2B: partnering with LLM-native applications that want a human presence layer. Elias names language learning, AI therapy, and AI tutoring as the clearest fits — categories where a face and natural movement plausibly improve engagement and retention relative to voice-only interfaces like ChatGPT's current mode.
The identity problem
The harder downstream question is verification. If virtual humans become convincing enough to impersonate real people in professional settings, the infrastructure needed to distinguish human from AI presence — tiered credentialing systems, something like Worldcoin's proof-of-personhood model — becomes a dependency on Canopy's own success. There is no resolution to that tension in the segment; it is raised as an open question.