News

Human teenagers outscored Google DeepMind and OpenAI at the International Math Olympiad — but probably for the last time

Jul 29, 2025

Key Points

Google DeepMind's Gemini 2 Deep Think and OpenAI's reasoning model both achieved gold-medal performance at the International Math Olympiad for the first time, solving five of six problems in under 4.5 hours without human intervention.
Human teenagers still outscored both AI systems overall, but the margin is collapsing—DeepMind improved from silver last year to gold this year, and an IMO gold medalist estimates AI will dominate the competition within a decade.
Both AI systems failed problem six, a combinatorics puzzle that stumped 569 of 630 student contestants, suggesting reasoning on truly novel problems remains a human advantage that may not persist much longer.

Summary

Human teenagers beat Google DeepMind and OpenAI at the International Math Olympiad—probably for the last time. Twenty-six students, including US team members Tiger Jeang (a two-time gold medalist) and Alexander Wang (a three-time gold medalist), scored higher than the AI systems. Both DeepMind and OpenAI achieved gold-medal performance at this year's IMO, marking the first time AI models have cleared that bar.

Google DeepMind's breakthrough

Gemini 2 Deep Think solved five of six problems flawlessly, earning 35 out of 42 points—the threshold for gold. The system completed the exam in under 4.5 hours, the official IMO time limit. This represents a dramatic acceleration from last year, when the model needed several days of compute to solve four problems and score silver (28 points). A critical difference this year: DeepMind's model operated entirely in natural language, without human intervention to translate problems into formal mathematical notation.

OpenAI's matching result

OpenAI's latest reasoning model matched DeepMind's performance, solving five of six problems for 35 points. The company ran the six problems immediately after the official exam closed and enlisted former IMO medalists to grade the results. OpenAI was not part of the official IMO event, which drew criticism from the math community. Terence Tao, IMO founder, called the announcement "rude and inappropriate." The distinction matters because DeepMind's results were officially graded by IMO coordinators under a formal program the organization created for tech companies, while OpenAI's grading, though credible, lacked that institutional stamp.

The human edge

Both AI systems failed problem six, a combinatorics puzzle asking about optimal tiling of a 2025×2025 grid. Of 630 student contestants, 569 received zero points on that question. Only six solved it perfectly. Tiger Jeang earned four points through a partial solution, more than nearly all peers. Alexander Wang, a high school senior competing in his fourth IMO and three-time gold medalist, said he was "50-50" on whether AI would achieve a perfect 42-point score next year.

Thong Luong, the senior staff research scientist who led DeepMind's effort, had told executives to expect bronze or silver. He overdelivered. A former IMO competitor himself who narrowly missed competing in high school, Luong made the human-machine gap visible in both speed and method.

Why the IMO matters

The IMO tests reasoning on problems explicitly designed to be novel and unconventional. The organization discards any problem resembling something published anywhere. This breaks AI's typical advantage because models trained on vast data excel at pattern-matching, not at inverting entirely new problems. Both systems still foundered on problem six, suggesting a pattern. But the improvement trajectory is steep. In 2021, Alexander Wei, then a PhD student, predicted AI wouldn't beat the IMO by July 2025. He later led OpenAI's IMO project and proved his own forecast wildly wrong.

The human-machine timeline

Sholto Douglas, a researcher at Anthropic, and others in the math community suggest this may be humanity's last IMO gold. The parallel to chess is instructive. Deep Blue beat Kasparov in 1997, and a period of "centaur chess"—human-plus-computer teams outperforming either alone—persisted roughly 15 to 20 years before pure computation dominance made the hybrid irrelevant. Go fell faster after AlphaGo's 2016 victory, particularly after "move 37," when the model invented a strategy no human expected.

One outlier view comes from an IMO gold medalist now at Brown University and DeepMind, who argues problem types like problem six will stump AI for "at least another decade." That's a contrarian bet given the pace of improvement.

Talent and economics

The broader dynamic is already reshaping talent flows. IMO gold medalists historically became professors or entered government. Today, algorithmic trading firms like Jane Street and XTX Markets, both IMO sponsors this year, actively recruit medalists for quantitative roles. By the time next year's IMO begins, the gold medalists may face a different equation: their scarcity premium eroding as AI systems improve, their economic value shifting to jobs AI cannot yet do.

You might also like...

OpenAI and Google DeepMind both hit IMO gold-medal math performance — sparking AGI debate

Jul 21, 2025

GPT-4.5 reviewed: 10x more compute, but is the improvement worth it?

Feb 28, 2025

Cognition integrates Windsurf acquisition with performance-first culture offer and accelerated equity for all

Aug 5, 2025