Simon Willison’s Weblog: GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet

Apr 14, 2025

—

Source URL: https://simonwillison.net/2025/Apr/14/gpt-4-1/
Source: Simon Willison’s Weblog
Title: GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet

Feedly Summary: OpenAI introduced three new models this morning: GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. These are API-only models right now, not available through the ChatGPT interface (though you can try them out in OpenAI’s API playground). All three models can handle 1,047,576 tokens of input and 32,768 tokens of output, and all three have a May 31, 2024 cut-off date (their previous models were mostly September 2023).
The models score higher than GPT-4o and GPT-4.5 on coding benchmarks, and do very well on long context benchmarks as well. They also claim improvements in instruction following – following requested formats, obeying negative instructions, sorting output and obeying instructions to say “I don’t know".
I released a new version of my llm-openai plugin supporting the new models. This is a new thing for the LLM ecosystem: previously OpenAI models were only supported in core, which meant I had to ship a full LLM release to add support for them.
You can run the new models like this:
llm install llm-openai-plugin -U
llm -m openai/gpt-4.1 "Generate an SVG of a pelican riding a bicycle"
The other model IDs are openai/gpt-4.1-mini and openai/gpt-4.1-nano.
Here’s the pelican riding a bicycle I got from full sized GPT-4.1:

I’m particularly excited by GPT-4.1 nano, which handles image and text input up to a million tokens and is priced lower than any other previous OpenAI model: $0.10/million for input and $0.40/million for output, less than previous cheapest OpenAI model GPT-4o-mini ($0.15/$0.60). I’ve updated my LLM pricing table to include the new models.
They’re not the cheapest overall though: Gemini 2.0 Flash Lite and, Gemini 1.5 Flash 8B, Amazon Nova Lite and Nova Micro and Mistral’s 3B, 8B and Small 3.1 hosted models remain less expensive.

A few closing thoughts on these new models:

The 1 million input token context thing is a really big deal. The huge token context has been a major competitive advantage for the Google Gemini models for a full year at this point – it’s reassuring to see other vendors start to catch up. I’d like to see the same from Anthropic – Claude was the first model to hit 200,000 but hasn’t shipped more than that yet (aside from a 500,000 token model that was restricted to their big enterprise partners).

OpenAI really emphasized code performance for this model. They called out the Aider benchmark in their announcement post.

As expected, GPT-4.5 turned out to be not long for this world:

We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months, on July 14, 2025, to allow time for developers to transition

In the livestream announcement Michelle Pokrass let slip that the codename for the model was Quasar – that’s the name of the stealth model that’s been previewing on OpenRouter for the past two weeks.

OpenAI shared a GPT 4.1 Prompting Guide, which includes this tip about long context prompting:

Especially in long context usage, placement of instructions and context can impact performance. If you have long context in your prompt, ideally place your instructions at both the beginning and end of the provided context, as we found this to perform better than only above or below. If you’d prefer to only have your instructions once, then above the provided context works better than below.

They also recommend XML-style delimiters over JSON for log context, suggesting this format (complete with the XML-invalid unquoted attribute) that’s similar to the format recommended by Anthropic for Claude:
The quick brown fox jumps over the lazy dog</doc>
There’s an extensive section at the end describing their recommended approach to applying file diffs: "we open-source here one recommended diff format, on which the model has been extensively trained".

Tags: ai, openai, generative-ai, llms, llm, vision-llms, llm-pricing, long-context

AI Summary and Description: Yes

Summary: OpenAI has launched three new models—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—designed for API use with significant enhancements in token handling and performance metrics, particularly for coding tasks. These models offer competitive pricing and context capabilities, marking a notable shift in the generative AI landscape as they challenge existing models from other vendors.

Detailed Description:

– **New Model Introductions**: OpenAI has introduced three API-only models: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models are currently not available through the ChatGPT interface but can be tested in OpenAI’s API playground.

– **Token Handling**:
– All three models can accept input of up to **1,047,576 tokens** and generate output of **32,768 tokens**, which is a notable enhancement in handling extensive data inputs, crucial for applications requiring long-context processing.

– **Performance Benchmarking**:
– The new models outperform their predecessors, GPT-4o and GPT-4.5, particularly in coding benchmarks.
– They have improved instruction-following abilities, which allow for better adherence to user requests, such as formatting and content restrictions.

– **Pricing Innovations**:
– The pricing for GPT-4.1 nano is attractive at **$0.10/million for input and $0.40/million for output**, positioning it as a cost-effective option among OpenAI models.
– Despite this, other models from competitors remain cheaper, demonstrating a competitive landscape in model pricing.

– **Strategic Context**:
– The introduction of the **1 million input token context** is a significant innovation, as it previously provided a competitive advantage for Google’s models.
– OpenAI’s focus on enhancing coding performance could serve as a catalyst for further competition in AI development among various organizations.

– **Deprecation of Older Models**:
– GPT-4.5 Preview will be deprecated as GPT-4.1 provides improved capabilities at a lower cost; this keeps the ecosystem evolving rapidly.

– **Additional Insights**:
– Best practices for prompting long context usage have been shared by OpenAI, emphasizing the importance of instruction placement for optimal model performance.
– OpenAI has provided new guidelines for data formatting, preferring XML-style delimiters for better context handling.

– **Industry Relevance**:
– This announcement is crucial for professionals in AI and cloud security as it could impact the security frameworks around model deployment and API management.
– The evolving landscape of generative AI models necessitates updated security measures and compliance considerations for organizations leveraging these tools.

This development signals a noteworthy shift in the generative AI ecosystem, challenging existing paradigms of model capabilities and pricing, making the analysis and adaptation of security frameworks vital for stakeholders in technology and compliance.

-4o .NET 0 Flash 1 10 2 2024 2025 24 3 4 5 7 a Act adaptation AGI AI AI development AI landscape ai model AI models aider alt Amazon analysis and Anthropic API API management app Application applications art as attribute benchmark benchmarking benchmarks Best best practices bicycle bing by C capabilities chat ChatGPT CI CIA Claude Cloud cloud security co code coding coding performance coding tasks Competition competitive competitive advantage competitive landscape competitive pricing competitors compliance compliance considerations content Context context handling context processing core cost cost-effective Current D data data input de demo deployment Deprecation design developer developers development e ecosystem effective end enterprise ERP exp face file first for framework frameworks full g Gemini Gemini 1.5 Gemini 2 Gemini 2.0 Gemini model Gemini models Gen generative Generative AI generative AI models Go Google Google Gemini GPT GPT-4o gs guidelines H heap high hosted hosted models HR http HTTPS image in industry innovation Innovations insights instruction following inter interface IRS ite J json k Key l land latency led Li Lite llm llm-pricing llms lm long long-context processing low making man management matt measures metrics Micro Mila mini Mistral ML Mode model model capabilities model deployment model performance model pricing models my N no NPU o of off on one only open open-source openai openrouter OPM opt organization organizations out output over paradigms pelican performance performance benchmark performance benchmarking performance metrics play plugin point porting post pre Preview price pricing process processing professionals prompt Prompting Q QUIC R rag rate RCE real red release riding right Ro s sam sec security security framework security frameworks security measure security measures SHA shift side Sig Signal Sim source SSO stakeholders start stealth strategic support SVG system T Tags: Task tasks tech technology test text the Thought Time to token token context tokens tool tools Tor TP transition turn two UI up update US usage use user V val Vantage vendor vendors version Vision vision-llms web Well Wi world x XML