Source URL: https://simonwillison.net/2025/Apr/17/ted-sanders/
Source: Simon Willison’s Weblog
Title: Quoting Ted Sanders, OpenAI
Feedly Summary: Our hypothesis is that o4-mini is a much better model, but we’ll wait to hear feedback from developers. Evals only tell part of the story, and we wouldn’t want to prematurely deprecate a model that developers continue to find value in. Model behavior is extremely high dimensional, and it’s impossible to prevent regression on 100% use cases/prompts, especially if those prompts were originally tuned to the quirks of the older model. But if the majority of developers migrate happily, then it may make sense to deprecate at some future point.
We generally want to give developers as stable as an experience as possible, and not force them to swap models every few months whether they want to or not.
— Ted Sanders, OpenAI, on deprecating o3-mini
Tags: openai, llms, ai, generative-ai
AI Summary and Description: Yes
Summary: The text discusses the challenges associated with deprecating AI models, specifically in the context of the o4-mini model from OpenAI. It highlights the importance of developer feedback and the need to ensure stability in user experience when transitioning between model versions.
Detailed Description: The commentary by Ted Sanders from OpenAI addresses the complexities involved in evaluating and transitioning between AI models, particularly large language models (LLMs). The discussion is relevant to professionals working in AI development and deployment, especially concerning version control and the implications for usability and stability for developers.
– **Model Comparison**: The text suggests a hypothesis that the newer o4-mini model offers considerable improvements over the previous version (o3-mini). However, it stresses the necessity of waiting for feedback from actual developers before making definitive decisions regarding deprecation.
– **Limitations of Evaluations**: Sanders emphasizes the limitations of evaluation metrics (referred to as “evals”) which only capture part of the performance picture. It suggests a deeper understanding of model behavior and user needs is essential.
– **Complexities of Model Behavior**: The statement highlights the high dimensionality of model behavior, indicating that different use cases and prompts can yield varying results. This underscores the challenge of ensuring consistent performance across all scenarios, particularly when users have become accustomed to the idiosyncrasies of older models.
– **Developer Experience**: There’s a clear intent to prioritize developer experiences, advocating for stability and predictability. The reluctance to force developers to switch models frequently is noted, which is important for maintaining productivity and minimizing disruptions in development workflows.
– **Future Considerations**: The potential timing for deprecating the o3-mini model would depend on the majority’s satisfaction with the o4-mini, thus showcasing a consultative approach in managing model transitions.
This commentary provides insights into the challenges of AI model management, emphasizing stability, user feedback, and the intricate nature of AI evaluation, all of which are critical considerations for AI security and infrastructure professionals.