Simon Willison’s Weblog: OpenAI o3-mini, now available in LLM

Source URL: https://simonwillison.net/2025/Jan/31/o3-mini/#atom-everything
Source: Simon Willison’s Weblog
Title: OpenAI o3-mini, now available in LLM

Feedly Summary: o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate – we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini or (if we have access) o1 Pro.
Confusing matters further, the benchmarks in the o3-mini system card (PDF) aren’t a universal win for o3-mini across all categories. It generally benchmarks higher than GPT-4o and o1 but not across everything.
The biggest win for o3-mini is on the Codeforces ELO competitive programming benchmark, which I think is described by this 2nd January 2025 paper, with the following scores:

o3-mini (high) 2130
o3-mini (medium) 2036
o1 1891
o3-mini (low) 1831
o1-mini 1650
o1-preview 1258
GPT-4o 900

Weirdly, that GPT-4o score was in an older copy of the System Card PDF which has been replaced by an updated document that doesn’t mention Codeforces ELO scores at all.
One note from the System Card that stood out for me concerning intended applications of o3-mini for OpenAI themselves:

We also plan to allow users to use o3-mini to search the internet and summarize the results in ChatGPT. We expect o3-mini to be a useful and safe model for doing this, especially given its performance on the jailbreak and instruction hierarchy evals detailed in Section 4 below.

This is notable because the existing o1 models on ChatGPT have not yet had access to their web search tool – despite the mixture of search and “reasoning" models having very clear benefits.
I released LLM 0.21 with support for the new model, plus its -o reasoning_effort heigh (or medium or low) option for tweaking the reasoning effort – details in this issue.
o3-mini is priced at $1.10/million input tokens, $4.40/million output tokens – less than half the price of GPT-4o (currently $2.50/$10) and massively cheaper than o1 ($15/60).
I tried using it to summarize this conversation about o3-mini on Hacker News, using my hn-summary.sh script. Here’s the result – it used 18,936 input tokens and 2,905 output tokens for a total cost of 3.3612 cents.
Tags: projects, ai, openai, generative-ai, llm, inference-scaling, o3

AI Summary and Description: Yes

Summary: The release of the o3-mini model by OpenAI introduces a new player in the landscape of generative AI, showcasing its benchmarks against other models and its intended applications, including internet searches and summarization. This model is designed to be economically viable while providing competitive performance metrics.

Detailed Description:

The release of o3-mini by OpenAI represents an important development in the field of generative AI. This model offers a compelling option for users by balancing affordability and performance in comparison to its predecessors and counterparts. Here are the key points about o3-mini:

– **Benchmark Performance**:
– o3-mini generally outperforms GPT-4o and o1 in many benchmarks but is not universally superior in all categories.
– Particularly notable is its performance on the Codeforces ELO competitive programming benchmark, where it achieved:
– o3-mini (high): 2130
– o3-mini (medium): 2036
– o1: 1891
– o3-mini (low): 1831
– o1-mini: 1650
– o1-preview: 1258
– GPT-4o: 900
– The performance metrics indicate its strength in areas relevant to programming and reasoning tasks.

– **Intended Use Cases**:
– OpenAI plans to integrate o3-mini’s capabilities into search functionalities, enabling internet searches and result summarization directly in ChatGPT. This would allow for more dynamic, real-time information processing.
– The model aims to be safe and effective, particularly concerning its evaluations in jailbreak scenarios, suggesting its potential stability and security in real-world applications.

– **Pricing Structure**:
– Competitive pricing of o3-mini is a strategic advantage, offering:
– $1.10 per million input tokens
– $4.40 per million output tokens
– This pricing is significantly lower than both GPT-4o ($2.50/$10) and o1 ($15/$60), making it accessible for broader usage scenarios.

– **Development and Updates**:
– The author refers to active projects related to LLMs, mentioning version updates and ongoing support for new models, which indicates a continuous improvement approach in model deployment.

– **Economic Implications**:
– The cost analysis from a practical usage scenario shows that o3-mini can summarize lengthy discussions efficiently, costing a mere 3.36 cents for a substantial amount of data processing.

Overall, the advancements presented with o3-mini not only contribute to the generative AI space but also highlight significant considerations for professionals focused on AI security, compliance, and application efficiency in cloud environments.