Simon Willison’s Weblog: ICPC medals for OpenAI and Gemini

Sep 17, 2025

—

Source URL: https://simonwillison.net/2025/Sep/17/icpc/#atom-everything
Source: Simon Willison’s Weblog
Title: ICPC medals for OpenAI and Gemini

Feedly Summary: In July it was the International Math Olympiad (OpenAI, Gemini), today it’s the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold medal performance.
OpenAI’s Mostafa Rohaninejad:

We received the problems in the exact same PDF form, and the reasoning system selected which answers to submit with no bespoke test-time harness whatsoever. For 11 of the 12 problems, the system’s first answer was correct. For the hardest problem, it succeeded on the 9th submission. Notably, the best human team achieved 11/12.
We competed with an ensemble of general-purpose reasoning models; we did not train any model specifically for the ICPC. We had both GPT-5 and an experimental reasoning model generating solutions, and the experimental reasoning model selecting which solutions to submit. GPT-5 answered 11 correctly, and the last (and most difficult problem) was solved by the experimental reasoning model.

And here’s the blog post by Google DeepMind’s Hanzhao (Maggie) Lin and Heng-Tze Cheng:

An advanced version of Gemini 2.5 Deep Think competed live in a remote online environment following ICPC rules, under the guidance of the competition organizers. It started 10 minutes after the human contestants and correctly solved 10 out of 12 problems, achieving gold-medal level performance under the same five-hour time constraint. See our solutions here.

I’m still trying to confirm if the models had access to tools in order to execute the code they were writing. The IMO results in July were both achieved without tools.
Tags: gemini, llm-reasoning, google, generative-ai, openai, ai, llms

AI Summary and Description: Yes

Summary: The text discusses the performance of AI models from OpenAI and Google DeepMind (Gemini) at prestigious programming contests, showcasing their capabilities in problem-solving and reasoning. This highlights advancements in the realm of AI, particularly in generative AI technologies relevant to software security and information security professionals.

Detailed Description: The provided text primarily describes the competitive performance of AI models, specifically from OpenAI and Google DeepMind, in two significant contests — the International Math Olympiad and the International Collegiate Programming Contest (ICPC). Here are the notable points:

– **AI Competitiveness**: Both OpenAI and Google DeepMind’s models achieved gold-medal level results at the ICPC, demonstrating that AI can effectively tackle complex algorithmic problems.

– **Performance Details**:
– OpenAI’s model processed problems from PDF documents and used a reasoning system to determine answers with impressive accuracy (correctly solving 11 out of 12 problems).
– The success of OpenAI models was augmented by an experimental reasoning model that made decisions regarding which solutions to submit.
– Google DeepMind’s Gemini 2.5 Deep Think model also performed remarkably, solving 10 out of 12 problems under competition constraints.

– **Operational Context**:
– Both models competed within strict rules typical of human contests, initiating their problem-solving efforts simultaneously alongside human participants.
– The ability of these models to tackle programming problems without bespoke training tailored for the ICPC indicates a high level of generalization in their training.

– **Technical Insights**:
– There’s an ongoing inquiry about whether the AI models used any external tools during problem-solving, especially since the previous competition saw successes achieved without such tools.

Key implications for professionals in AI, cloud, and infrastructure security include:

– **Understanding AI Capabilities**: The high-performance levels of these AI systems can aid in recognizing their potential and limitations regarding software security challenges.

– **Integration Potential**: The applications of such generative AI models could extend to automating complex tasks usually performed by developers, raising questions of trust and the need for robust security measures around their deployment.

– **Future Innovations**: The competitive success illustrates a growing trend in leveraging AI for high-stakes problem-solving, necessitating ongoing discussions around security, compliance, and ethical considerations in AI usage.

In conclusion, the results from these programming contests reflect not only the state of AI advancement but also the need for professionals in the field to stay abreast of how these models can impact software and information security strategies.

.NET 1 10 2 2025 5 7 a access accuracy Act advanced advancement advancements after age AGI AI AI capabilities ai model AI models AI systems AI technologies algorithm All and app Application applications art as at Augment Auto Best Bi bot by C capabilities challenge challenges CI CIA Cloud co code Col Competition competitive competitive performance competitiveness compliance Context D day de decision decisions deep DeepMind demo deployment developer developers document e effective end environment ethical ethical considerations exp External first following for future g Gemini Gemini 2 Gen general generalization generative Generative AI generative AI models Go Google Google DeepMind GPT gs guidance H high high-performance Highlight http HTTPS human impact implications in information information security information security professionals infrastructure infrastructure security innovation Innovations insights integration inter intern io Iron IRS J k Key l led level Li limitations line llm llms lm long low M made man math measures mini mission Mode model models N nation NGO no o oE of on only ons open openai operation operational operational context oS out out tool pdf per performance point post potential pre pro problem problem-solving process professionals programming programming contest ps Q question R rag raising rate RCE re real reasoning reasoning mode reasoning model reasoning models red remote Ro robust security robust security measures row Rust s sam sec security security challenges security measure security measures security professionals security strategies side Sig Sim Simon Willison software software security solutions solving source specific SSE STAR start state STIG strategies system systems T Tags: Tails Task tasks team tech technical technical insights technologies ted test text the Time to tool tools TP training trust two UI under US usage use V version Ware web Wi writing x yt z