Simon Willison’s Weblog: Quoting Ted Sanders, OpenAI

Apr 17, 2025

—

Source URL: https://simonwillison.net/2025/Apr/17/ted-sanders/
Source: Simon Willison’s Weblog
Title: Quoting Ted Sanders, OpenAI

Feedly Summary: Our hypothesis is that o4-mini is a much better model, but we’ll wait to hear feedback from developers. Evals only tell part of the story, and we wouldn’t want to prematurely deprecate a model that developers continue to find value in. Model behavior is extremely high dimensional, and it’s impossible to prevent regression on 100% use cases/prompts, especially if those prompts were originally tuned to the quirks of the older model. But if the majority of developers migrate happily, then it may make sense to deprecate at some future point.
We generally want to give developers as stable as an experience as possible, and not force them to swap models every few months whether they want to or not.
— Ted Sanders, OpenAI, on deprecating o3-mini
Tags: openai, llms, ai, generative-ai

AI Summary and Description: Yes

Summary: The text discusses the challenges associated with deprecating AI models, specifically in the context of the o4-mini model from OpenAI. It highlights the importance of developer feedback and the need to ensure stability in user experience when transitioning between model versions.

Detailed Description: The commentary by Ted Sanders from OpenAI addresses the complexities involved in evaluating and transitioning between AI models, particularly large language models (LLMs). The discussion is relevant to professionals working in AI development and deployment, especially concerning version control and the implications for usability and stability for developers.

– **Model Comparison**: The text suggests a hypothesis that the newer o4-mini model offers considerable improvements over the previous version (o3-mini). However, it stresses the necessity of waiting for feedback from actual developers before making definitive decisions regarding deprecation.

– **Limitations of Evaluations**: Sanders emphasizes the limitations of evaluation metrics (referred to as “evals”) which only capture part of the performance picture. It suggests a deeper understanding of model behavior and user needs is essential.

– **Complexities of Model Behavior**: The statement highlights the high dimensionality of model behavior, indicating that different use cases and prompts can yield varying results. This underscores the challenge of ensuring consistent performance across all scenarios, particularly when users have become accustomed to the idiosyncrasies of older models.

– **Developer Experience**: There’s a clear intent to prioritize developer experiences, advocating for stability and predictability. The reluctance to force developers to switch models frequently is noted, which is important for maintaining productivity and minimizing disruptions in development workflows.

– **Future Considerations**: The potential timing for deprecating the o3-mini model would depend on the majority’s satisfaction with the o4-mini, thus showcasing a consultative approach in managing model transitions.

This commentary provides insights into the challenges of AI model management, emphasizing stability, user feedback, and the intricate nature of AI evaluation, all of which are critical considerations for AI security and infrastructure professionals.

.NET 1 10 2 2025 3 4 5 7 a Act addresses AGI AI AI development ai model AI models AI security and app art as Behavior by C CERN challenges CI CIA CleaR co Context control core critical cross D de decision decisions deep DeFi deployment Deprecation developer developer experience developers development development work development workflow development workflows disruption e end evals evaluation Evaluation Metrics evaluations event exp experience fact feedback for future future considerations g Gen general generative gs H high Highlight http HTTPS implications in infrastructure insights intent iOS J k l language language model language models large large language model large language models Large Language Models (LLMs) led Li limitations llm llms lm low making man management metrics mini Mode model model behavior model comparison model management model transition models N needs no o o3 of off on only open openai OPM ory over performance point potential pre product productivity professionals prompt prompts Q R rate RCE red Ro s sec security side Sig Sim SoC source specific SSE SSO stability state T Tags: text the to Tor TP transition UI under up US usability use use cases user user experience user feedback user needs Users V val Valuation version version control WAN web Wi workflow workflows x