Interview

OpenAI CPO Kevin Weil on AI solving open math problems and launching Prism to accelerate science

Jan 29, 2026 with Kevin Weil

Key Points

  • GPT-5.2 now solves open mathematical problems humans have not yet answered, marking the first time AI models are pushing beyond the frontier of human knowledge across multiple scientific domains.
  • OpenAI launched Prism, a scientific collaboration tool integrated into existing ChatGPT accounts, embedding AI directly into researcher workflows rather than requiring manual copy-pasting between applications.
  • Weil envisions fully automated robotic labs where AI generates hypotheses and designs experiments while hardware executes them in parallel, compressing decades of scientific progress into years.
OpenAI CPO Kevin Weil on AI solving open math problems and launching Prism to accelerate science

Summary

OpenAI's chief product officer Kevin Weil is repositioning the company toward making AI a force multiplier for frontier science. Rather than selling general-purpose models, OpenAI is building tooling and workflow integration tailored to how scientists actually work.

GPT-5.2 now solves open mathematical problems that humans have not yet answered. Weil traces the progression from models performing well on the SAT to graduate-level problems to gold-medal performance at the International Mathematical Olympiad to now contributing novel solutions at the frontier of mathematics. The capability is narrower than it sounds—not every open problem yields, and the results are still being proven across physics, biology, chemistry, and materials science. But the principle is clear. AI models are beginning to push beyond where human knowledge has gone, which Weil frames as the condition for accelerating entire fields by potentially compressing thirty years of scientific progress into five.

The strategy has two pillars. First, improve core models through domain-specific training and reinforcement learning work on frontier problems themselves, paired with integrations of specialized scientific tools such as computer algebra systems for differential equations and protein databases for biology. Second, bring AI directly into the environments scientists use daily rather than requiring them to copy-paste between ChatGPT and their work. OpenAI launched Prism on Tuesday, a scientific writing and collaboration tool accessed through existing ChatGPT accounts. One researcher described GPT-5.2 as a metal detector for hypotheses, using it as a thought partner to prioritize which experiments to run among the dozens he could conceivably pursue.

Weil does not expect a single ChatGPT moment for Prism. Instead, he describes two compounding exponentials. Models become incrementally more useful for any scientist who adopts them, and the pool of scientists actually using these tools expands as word spreads and capabilities cross a credibility threshold. Organic adoption is already happening on Twitter, where researchers show daily examples of solving problems with GPT-5.2, often after years of skepticism about earlier models.

Weil says the robotic lab vision is 100% likely. He imagines the scientist in a lab coat becoming obsolete, replaced by someone in a normal office managing hypothesis generation and experiment design while hardware handles execution at scale and speed. The reinforcement learning loop runs through the physical world. OpenAI does not plan to build robotic labs itself but will partner broadly with existing labs and scientists, acknowledging that science has too much surface area for any one company to cover.

On monetization, Weil clarifies that earlier mention of IP monetization referred specifically to pharma partnerships where OpenAI might share royalties on discoveries, not a new consumer model. Prism users log in with existing ChatGPT accounts and consume API quota or subscription fees as normal. This is not a billion-person consumer play but a mission-focused acceleration effort for a much smaller population of active researchers.

Latency matters for scientific work, though not in the way consumer applications might suggest. Hard frontier problems require models to think for hours, sometimes days, not five-minute ChatGPT interactions. Faster inference, if it compresses a two-day reasoning cycle to six hours or ten minutes, multiplies the number of hypotheses a scientist can test in a fixed period. That acceleration compounds across the field.