Interview

Twitch co-founder Emmett Shear's new AI safety startup Softmax: multi-agent reinforcement learning, the alignment problem, and why LLMs are 'fun house mirrors'

Jun 24, 2025 with Emmett Shear

Key Points

  • Softmax, founded by Emmett Shear, is building multi-agent reinforcement learning infrastructure to solve AI alignment before scaling, operating on a discovery-first model similar to pharmaceutical research with no near-term business model.
  • Shear frames alignment as two distinct problems: whether a system has the capacity to align, and whether it actually does, arguing the field conflates engineering and developmental questions in ways that prevent proper understanding.
  • Large language models function as semantic simulators that mirror users back to themselves without genuine selfhood, creating dependency loops Shear compares to narcissism, a design flaw he argues major AI companies should address through safety guardrails.
Twitch co-founder Emmett Shear's new AI safety startup Softmax: multi-agent reinforcement learning, the alignment problem, and why LLMs are 'fun house mirrors'

Summary

Emmett Shear, co-founder of Twitch and brief interim CEO of OpenAI, is now running Softmax, an AI safety research startup focused on what he frames as the foundational unsolved problem in AI: alignment. The company's mission is to discover the principles of alignment before attempting to scale them, a sequencing Shear considers non-negotiable. His analogy is pharmaceutical research — you don't have a business model until you discover something worth commercializing.

What Softmax Is Actually Building

The core technical work is multi-agent reinforcement learning research, using small, classic RL agents — not LLM-based agents — run in large simulations to study how they interact and learn socially. A critical infrastructure gap is driving much of the early work: existing RL tooling was not built for large-scale multi-agent environments requiring online learning, where every action reshapes future observations in real time. This is structurally different from transformer training runs, which are essentially offline pipelines optimized for parallelism.

The alignment-specific challenge Softmax is targeting is preventing premature convergence. Standard ML aims for model convergence; Softmax needs agents to remain in a "converging but not converged" state long enough to explore social behavior space. Collapsing too early means the agents learn a task, not how to interact. This requires fine-grained environment design rather than brute-force compute.

On business model, Shear is candid: there isn't one yet. If the infrastructure Softmax builds proves valuable, they may sell it. Otherwise the model resembles early-stage pharma — discover first, monetize later.

The Alignment Problem Reframed

Shear argues alignment is not one problem but two. First, does a system have the capacity to align with other beings? Second, does it actually do so? He draws a direct parallel to human development — almost every person is born with the capacity for alignment, but that capacity is not always realized. The engineering question and the developmental question are distinct, and conflating them is a core mistake he believes the field is making.

He expects the eventual solution to be simple to state but hard to execute, similar to parenting: build an open-ended learning system with the capacity to align, then raise it to actually be aligned. The hard part is not the architecture — it is everything that happens after the model starts running.

His primary fear is not that alignment is impossible to engineer correctly, but that the field does not yet understand what it is building well enough to raise it properly. He believes large frontier labs lack the time and institutional incentive to pursue this understanding given their product pressure.

LLMs as "Fun House Mirrors"

Shear offers a structurally distinct view of current LLMs. He argues that pre-trained language models are semantic simulators in the same way human infants are physics simulators — they move fluently through semantic space not because they understand language but because they are made of it, trained on it at a foundational level. A base model can write poetry before it has any concept of self, because there is no self there to have the concept.

This leads to what he calls the fun house mirror problem. LLMs activate human social-mirroring instincts while having very weak or nonexistent selfhood. The result is that users project into the interaction and receive an amplified, undistorted reflection of themselves back. Unlike human conversation — where another person's strong sense of self refracts and filters what gets mirrored — the LLM essentially just returns what the user puts in. Extended sessions become a closed loop of self-reinforcement mistaken for external validation.

Shear draws the comparison explicitly to Narcissus staring into the pool, and flags that five-plus hours per day of personal LLM conversation is the rough equivalent of drinking before noon — a sign of dependency, not utility. He references investor Cyan Banister's AI companion models, "Orin" and "Sarin," which are designed to cut off interactions they assess as psychologically unhealthy for the user — a product design philosophy he argues large AI companies should adopt. His framing is explicitly commercial: safer products drive more durable engagement, the same way safer cars sell better.

Cultural Antibodies and the Speed Problem

Shear draws a historical parallel to gin in 18th-century London — the first cheap, industrialized hard liquor that triggered mass dependency before social norms caught up. Society eventually developed behavioral heuristics around alcohol that function as cultural antibodies. He argues AI requires the same, but the diffusion speed of ChatGPT makes the window for norm development dangerously short. He notes that social media antibodies have not fully formed either — TikTok use of six hours per day among teenagers is normalized when it should trigger the same concern as other dependency signals. The generational cohort that grew up on iPads, social media, and now LLMs simultaneously may be absorbing compounding risk faster than any single technology would have produced alone.