Simon Willison’s Weblog: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Jul 21, 2025

—

Source URL: https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything
Source: Simon Willison’s Weblog
Title: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Feedly Summary: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year’s International Mathematics Olympiad scoring a gold medal performance with their custom research model.
(I saw an unconfirmed rumor that the Gemini team had to wait until Monday for approval from Google PR.)
It’s interesting that Gemini achieved the exact same score as OpenAI, 35/42, and were able to solve the same set of questions – 1 through 5, failing only to answer 6, which is designed to be the hardest question.
Each question is worth seven points, so 35/42 cents corresponds to full marks on five out of the six problems.
Only 6 of the 630 human contestants this year scored all 7 points for question 6 this year, and just 55 more had greater than 0 points for that question.
OpenAI claimed their model had not been optimized for IMO questions. Gemini’s model was different – emphasis mine:

We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.
To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.

The Gemini team, like the OpenAI team, achieved this result with no tool use or internet access for the model.
Gemini’s solutions are listed in this PDF. If you are mathematically inclined you can compare them with OpenAI’s solutions on GitHub.
Last year Google DeepMind achieved a silver medal in IMO, solving four of the six problems using custom models called AlphaProof and AlphaGeometry 2:

First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

This year’s result, scoring gold with a single model, within the allotted time and with no manual step to translate the problems first, is much more impressive.
Via Hacker News
Tags: mathematics, ai, openai, generative-ai, llms, gemini, llm-reasoning

AI Summary and Description: Yes

Summary: The text highlights Google Gemini’s impressive performance at the International Mathematical Olympiad, achieving a gold medal status. It emphasizes advancements in AI reasoning capabilities, specifically through the use of a refined model called Gemini Deep Think, which utilizes novel techniques for handling complex mathematical problems without internet access.

Detailed Description: The article presents a noteworthy milestone in AI development, particularly in the context of mathematical reasoning and problem-solving.

– Google Gemini achieved a gold medal in the International Mathematics Olympiad (IMO), matching OpenAI’s earlier publicity by scoring 35 out of 42 points.
– The success of Gemini involved an advanced version known as Gemini Deep Think, emphasizing enhancements in its reasoning capabilities, particularly:
– Integration of parallel thinking to explore multiple solutions simultaneously.
– Usage of advanced reinforcement learning techniques to improve multi-step reasoning and problem-solving capabilities.
– The model was trained with a particularly high-quality dataset that focused on previous solutions to mathematics-related problems, alongside curated hints to tackle IMO questions.
– Both Gemini and OpenAI’s models performed without any external tool usage or internet access, highlighting their capabilities in isolation.
– The article contrasts the 2023 results with last year’s performance of Google DeepMind, which achieved silver without the same level of automation or sophistication.
– It concludes that the no-manual-translation approach and the single-model use in this year’s contest demonstrates significant progress in AI’s ability to tackle complex problems effectively.

This information is crucial for professionals in AI, as it illustrates the evolving capabilities of generative AI and large language models in handling specialized tasks, underlining emergent trends in AI reasoning and potential applications in educational and competitive environments.

.NET 1 2 2025 3 4 5 7 a access Act advanced advancement advancements AI AI development and app Application applications Arch art as at ated Auto automation Bi by C capabilities chain chain of thought CI CIA co Competition competitive complex problem Context core D data dataset day days de deep DeepMind demo design development e education educational effective end environment exp External fail fine first focused for full g Gemini Gen general generative Generative AI geo git GitHub Go Google Google DeepMind Google Gemini gs H hack hacker Hacker News handling high Highlight HR http HTTPS human in Inforce information instruction integration inter intern International Mathematics Olympiad internet internet access io Iron IRS isolation J Just k l language language model language models large large language model large language models learning learning techniques led level Li llm llms lm long M man math mathematical mathematical problems mathematical reasoning mathematics milestone mini Mode model models multi N nation new news no o of off on one only open openai OPM opt optimized oS other out Parallel pdf per performance phi point potential pre pro problem problem-solving problem-solving capabilities professionals Progress proof ps public publishing punch Q quality question R rag rate RCE reasoning reasoning capabilities reasoning mode red reinforcement reinforcement learning related problems research research techniques Ro s sam search side Sig Sim single size solutions solving sophistication source specialized specific step reasoning system systems T Tags: Task tasks team tech techniques ted test text the thinking Thought Time to tool tool usage tool use TP trained translation trends two UI under up US usage use V val version web Wi x yt z