Simon Willison’s Weblog: GPT‑5-Codex and upgrades to Codex

Sep 15, 2025

—

Source URL: https://simonwillison.net/2025/Sep/15/gpt-5-codex/#atom-everything
Source: Simon Willison’s Weblog
Title: GPT‑5-Codex and upgrades to Codex

Feedly Summary: GPT‑5-Codex and upgrades to Codex
OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools.
I say half-released because it’s not yet available via their API, but they “plan to make GPT‑5-Codex available in the API soon".
I wrote about the confusing array of OpenAI products that share the name Codex a few months ago. This new model adds yet another, though at least "GPT-5-Codex" (using two hyphens) is unambiguous enough not to add to much more to the confusion.
At this point it’s best to think of Codex as OpenAI’s brand name for their coding family of models and tools.
The new model is already integrated into their VS Code extension, the Codex CLI and their Codex Cloud asynchronous coding agent. I’d been calling that last one "Codex Web" but I think Codex Cloud is a better name since it can also be accessed directly from their iPhone app.
Codex Cloud also a new feature: you can configure it to automatically run code review against specific GitHub repositories (I found that option on chatgpt.com/codex/settings/code-review) and it will create a temporary container to use as part of those reviews. Here’s the relevant documentation.
Some documented features of the new GPT-5-Codex model:

Specifically trained for code review, which directly supports their new code review feature.
"GPT‑5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task." Simple tasks (like "list files in this directory") should run faster. Large, complex tasks should use run for much longer – OpenAI report Codex crunching for seven hours in some cases!
Increased score on their proprietary "code refactoring evaluation" from 33.9% for GPT-5 (high) to 51.3% for GPT-5-Codex (high). It’s hard to evaluate this without seeing the details of the eval but it does at least illustrate that refactoring performance is something they’ve focused on here.
"GPT‑5-Codex also shows significant improvements in human preference evaluations when creating mobile websites" – in the past I’ve habitually prompted models to "make it mobile-friendly", maybe I don’t need to do that any more.
"We find that comments by GPT‑5-Codex are less likely to be incorrect or unimportant" – less unimportant comments in code is definitely an improvement!

Theo Browne has a video review of the model and accompanying features. He was generally impressed but noted that it was surprisingly bad at using the Codex CLI search tool to navigate code. Hopefully that’s something that can fix with a system prompt update.
Finally, can it drew a pelican riding a bicycle? Without API access I instead got Codex Cloud to have a go by prompting:

Generate an SVG of a pelican riding a bicycle, save as pelican.svg

Here’s the result:

Tags: ai, openai, generative-ai, llms, ai-assisted-programming, pelican-riding-a-bicycle, llm-release, coding-agents, gpt-5, codex-cli

AI Summary and Description: Yes

Summary: The introduction of GPT-5-Codex represents a significant step forward in AI-assisted programming, particularly with enhanced coding and code review capabilities. This fine-tuned model aims to streamline workflow for developers through integration with various tools and improved performance metrics.

Detailed Description: The GPT-5-Codex, a variant of OpenAI’s GPT-5, specifically targets AI-assisted programming tools and emphasizes its brand identity within the Codex product line. While it is not yet accessible via API, OpenAI plans on making it available soon. Here are the key aspects of the new model:

– **Model Purpose**: GPT-5-Codex is explicitly designed for programming tasks, particularly aimed at enhancing code review processes.
– **Integration**: It has been integrated into various tools such as:
– Visual Studio Code extension
– Codex Command Line Interface (CLI)
– Codex Cloud for asynchronous coding tasks
– **Code Review Feature**: New capabilities include the ability to automatically run code reviews against specified GitHub repositories, utilizing temporary containers for these reviews.
– **Performance Improvements**:
– The model dynamically adjusts processing time based on task complexity; simpler tasks complete more quickly, while more complex tasks may take hours.
– It achieved an increased score on the proprietary code refactoring evaluation, rising from 33.9% to 51.3%.
– Improvements observed in human preference evaluations for creating mobile websites, indicating a likely reduction in the need for manual adjustments.
– Notably reduced the likelihood of producing irrelevant comments in code, enhancing overall code quality.

– **User Feedback**: Feedback included general impressions of the model’s capabilities, with some noted limitations in navigating code using the Codex CLI search tool, pointing towards potential areas for improvement.

– **Accessibility and Future Updates**: The model will soon be accessible via API, expanding its usability for developers and potentially leading to future updates based on user interactions.

This development is crucial for professionals in security, compliance, and infrastructure domains as it emphasizes elevated coding standards and code review processes, which are critical for maintaining secure coding practices and compliance with regulatory standards in software development.

.NET 1 2 2025 3 5 a access accessibility Act actions age agent agents AI ai-assisted-programming All and API app Arch Aria art as assisted assisted programming async asynchronous at ated Auto based Best Bi bicycle by C calling capabilities chat ChatGPT CI CIA cli Cloud co code code extension code quality code refactoring code review code reviews codex coding coding agent coding practices coding standards coding tasks command command line complexity compliance container containers core critical D day de DeFi design developer developers development document documentation domain domains e end evaluation evaluations exp face fact fast faster feature features feedback file fine focused for friendly full future g Gen general generative git GitHub GitHub repositories Go GPT grade gs H high HR http HTTPS human human preference evaluation identity in infrastructure integration inter interaction interactions interface io iPhone ite J Just k Key l large leading least led Li limitations line line interface llm llms lm long low M making man metrics ML Mobile Mode model models N new no o oE of on one ons open openai OPM opt ory oS other out over pelican per performance performance improvement performance improvements performance metrics point potential practices pre pro process processes processing product products professionals programming programming tasks prompt Prompting proprietary ps Q quality QUIC R rate Ray RCE re ready red reduction refactoring regulatory release report review reviews riding Ro RoT row s search Search tool sec secure secure coding secure coding practices security settings SHA Sig Sim Simon Willison Simple size sizes software software development source specific SSE standards studio support SVG system system prompt T Tags: Tails target Task task complexity tasks ted the thinking Time to tool tools Tor TP trained two UI up update updates upgrade upgrades US usability use user user feedback user interaction user interactions V val Valuation video Visual Studio Code Ware web website Wi workflow x yt z