Source URL: https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything
Source: Simon Willison’s Weblog
Title: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
Feedly Summary: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year’s International Mathematics Olympiad scoring a gold medal performance with their custom research model.
(I saw an unconfirmed rumor that the Gemini team had to wait until Monday for approval from Google PR.)
It’s interesting that Gemini achieved the exact same score as OpenAI, 35/42, and were able to solve the same set of questions – 1 through 5, failing only to answer 6, which is designed to be the hardest question.
Each question is worth seven points, so 35/42 cents corresponds to full marks on five out of the six problems.
Only 6 of the 630 human contestants this year scored all 7 points for question 6 this year, and just 55 more had greater than 0 points for that question.
OpenAI claimed their model had not been optimized for IMO questions. Gemini’s model was different – emphasis mine:
We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.
To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.
The Gemini team, like the OpenAI team, achieved this result with no tool use or internet access for the model.
Gemini’s solutions are listed in this PDF. If you are mathematically inclined you can compare them with OpenAI’s solutions on GitHub.
Last year Google DeepMind achieved a silver medal in IMO, solving four of the six problems using custom models called AlphaProof and AlphaGeometry 2:
First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.
This year’s result, scoring gold with a single model, within the allotted time and with no manual step to translate the problems first, is much more impressive.
Via Hacker News
Tags: mathematics, ai, openai, generative-ai, llms, gemini, llm-reasoning
AI Summary and Description: Yes
Summary: The text highlights Google Gemini’s impressive performance at the International Mathematical Olympiad, achieving a gold medal status. It emphasizes advancements in AI reasoning capabilities, specifically through the use of a refined model called Gemini Deep Think, which utilizes novel techniques for handling complex mathematical problems without internet access.
Detailed Description: The article presents a noteworthy milestone in AI development, particularly in the context of mathematical reasoning and problem-solving.
– Google Gemini achieved a gold medal in the International Mathematics Olympiad (IMO), matching OpenAI’s earlier publicity by scoring 35 out of 42 points.
– The success of Gemini involved an advanced version known as Gemini Deep Think, emphasizing enhancements in its reasoning capabilities, particularly:
– Integration of parallel thinking to explore multiple solutions simultaneously.
– Usage of advanced reinforcement learning techniques to improve multi-step reasoning and problem-solving capabilities.
– The model was trained with a particularly high-quality dataset that focused on previous solutions to mathematics-related problems, alongside curated hints to tackle IMO questions.
– Both Gemini and OpenAI’s models performed without any external tool usage or internet access, highlighting their capabilities in isolation.
– The article contrasts the 2023 results with last year’s performance of Google DeepMind, which achieved silver without the same level of automation or sophistication.
– It concludes that the no-manual-translation approach and the single-model use in this year’s contest demonstrates significant progress in AI’s ability to tackle complex problems effectively.
This information is crucial for professionals in AI, as it illustrates the evolving capabilities of generative AI and large language models in handling specialized tasks, underlining emergent trends in AI reasoning and potential applications in educational and competitive environments.