Source URL: https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/
Source: Simon Willison’s Weblog
Title: OpenAI reasoning models: Advice on prompting
Feedly Summary: OpenAI reasoning models: Advice on prompting
OpenAI’s documentation for their o1 and o3 “reasoning models" includes some interesting tips on how to best prompt them:
Developer messages are the new system messages: Starting with o1-2024-12-17, reasoning models support developer messages rather than system messages, to align with the chain of command behavior described in the model spec.
This appears to be a purely aesthetic change made for consistency with their instruction hierarchy concept. As far as I can tell the old system prompts continue to work exactly as before – you’re encouraged to use the new developer message type but it has no impact on what actually happens.
Since my LLM tool already bakes in a llm –system "system prompt" option which works across multiple different models from different providers I’m not going to rush to adopt this new language!
Use delimiters for clarity: Use delimiters like markdown, XML tags, and section titles to clearly indicate distinct parts of the input, helping the model interpret different sections appropriately.
Anthropic have been encouraging XML-ish delimiters for a while (I say -ish because there’s no requirement that the resulting prompt is valid XML). My files-to-prompt tool has a -c option which outputs Claude-style XML, and in my experiments this same option works great with o1 and o3 too:
git clone https://github.com/tursodatabase/limbo
cd limbo/bindings/python
files-to-prompt . -c | llm -m o3-mini \
-o reasoning_effort high \
–system ‘Write a detailed README with extensive usage examples’
Limit additional context in retrieval-augmented generation (RAG): When providing additional context or documents, include only the most relevant information to prevent the model from overcomplicating its response.
This makes me thing that o1/o3 are not good models to implement RAG on at all – with RAG I like to be able to dump as much extra context into the prompt as possible and leave it to the models to figure out what’s relevant.
Try zero shot first, then few shot if needed: Reasoning models often don’t need few-shot examples to produce good results, so try to write prompts without examples first. If you have more complex requirements for your desired output, it may help to include a few examples of inputs and desired outputs in your prompt. Just ensure that the examples align very closely with your prompt instructions, as discrepancies between the two may produce poor results.
Providing examples remains the single most powerful prompting tip I know, so it’s interesting to see advice here to only switch to examples if zero-shot doesn’t work out.
Be very specific about your end goal: In your instructions, try to give very specific parameters for a successful response, and encourage the model to keep reasoning and iterating until it matches your success criteria.
This makes sense: reasoning models "think" until they reach a conclusion, so making the goal as unambiguous as possible leads to better results.
Markdown formatting: Starting with o1-2024-12-17, reasoning models in the API will avoid generating responses with markdown formatting. To signal to the model when you do want markdown formatting in the response, include the string Formatting re-enabled on the first line of your developer message.
This one was a real shock to me! I noticed that o3-mini was outputting • characters instead of Markdown * bullets and initially thought that was a bug.
I first saw this while running this prompt against limbo/bindings/python using files-to-prompt:
git clone https://github.com/tursodatabase/limbo
cd limbo/bindings/python
files-to-prompt . -c | llm -m o3-mini \
-o reasoning_effort high \
–system ‘Write a detailed README with extensive usage examples’
Here’s the full result, which includes text like this (note the weird bullets):
Features
——–
• High‑performance, in‑process database engine written in Rust
• SQLite‑compatible SQL interface
• Standard Python DB‑API 2.0–style connection and cursor objects
I ran it again with this modified prompt:
Formatting re-enabled. Write a detailed README with extensive usage examples.
And this time got back proper Markdown, rendered in this Gist. That did a really good job, and included bulleted lists using this valid Markdown syntax instead:
– **`make test`**: Run tests using pytest.
– **`make lint`**: Run linters (via [ruff](https://github.com/astral-sh/ruff)).
– **`make check-requirements`**: Validate that the `requirements.txt` files are in sync with `pyproject.toml`.
– **`make compile-requirements`**: Compile the `requirements.txt` files using pip-tools.
Via @harjotsgill
Tags: o1, openai, o3, markdown, ai, llms, prompt-engineering, generative-ai, inference-scaling, rag
AI Summary and Description: Yes
Summary: The text discusses recent updates and best practices for prompting OpenAI’s reasoning models, focusing on the transition from system messages to developer messages, the use of delimiters for better input clarity, and the approach of limiting context in retrieval-augmented generation (RAG). These insights are pertinent for AI professionals aiming to optimize interaction with language models, enhancing their effectiveness in various applications.
Detailed Description: This write-up emphasizes how users can effectively interact with OpenAI’s reasoning models (o1 and o3), following updates in prompting strategies. Here are the crucial points:
– **Shift to Developer Messages**:
– OpenAI has updated its models to support developer messages instead of system messages.
– This change is primarily aesthetic and aims to enhance consistency in instruction hierarchy but leaves functionality intact.
– **Use of Delimiters**:
– Clear indications of different parts of input via delimiters (like markdown or XML tags) help models interpret information better.
– The text mentions assistance from tools that output Claude-style XML compatible with OpenAI models.
– **Retrieval-Augmented Generation (RAG)**:
– Users are encouraged to limit additional context to only the most relevant information to promote coherence in model responses.
– It suggests skepticism regarding RAG with certain models if they struggle to maintain relevance.
– **Prompting Strategies**:
– A recommendation to initially use zero-shot prompting, switching to few-shot examples only if necessary, highlights the models’ ability to generate quality responses without prior examples.
– Being specific in prompts is encouraged; clear end goals lead to superior model output.
– **Markdown Formatting Changes**:
– There’s a noteworthy change where the reasoning models avoid generating markdown formatting responses unless explicitly signaled by the string “Formatting re-enabled” at the start of the developer message.
– This adjustment initially caused confusion among users, as the output formatting could appear unrefined without the right prompt signal.
Overall, these guidelines and updates are essential for AI practitioners to improve their utilization of OpenAI’s reasoning models, ensuring more effective and accurate AI interaction across various applications, especially in software development and data management. Using these insights can streamline workflows and enhance compliance with emerging standards in AI usage and prompting.