Simon Willison’s Weblog: Is AI progress slowing down?

Source URL: https://simonwillison.net/2024/Dec/19/is-ai-progress-slowing-down/#atom-everything
Source: Simon Willison’s Weblog
Title: Is AI progress slowing down?

Feedly Summary: Is AI progress slowing down?
This piece by Arvind Narayanan and Sayash Kapoor is the single most insightful essay about AI and LLMs I’ve seen in a long time. It’s long and worth reading every inch of it – it defies summarization, but I’ll try anyway.
The key question they address is the widely discussed issue of whether model scaling has stopped working. Last year it seemed like the secret to ever increasing model capabilities was to keep dumping in more data and parameters and training time, but the lack of a convincing leap forward in the two years since GPT-4 – from any of the big labs – suggests that’s no longer the case.

The new dominant narrative seems to be that model scaling is dead, and “inference scaling”, also known as “test-time compute scaling” is the way forward for improving AI capabilities. The idea is to spend more and more computation when using models to perform a task, such as by having them “think” before responding.

Inference scaling is the trick introduced by OpenAI’s o1 and now explored by other models such as Qwen’s QwQ. It’s an increasingly practical approach as inference gets more efficient and cost per token continues to drop through the floor.
But how far can inference scaling take us, especially if it’s only effective for certain types of problem?

The straightforward, intuitive answer to the first question is that inference scaling is useful for problems that have clear correct answers, such as coding or mathematical problem solving. […] In contrast, for tasks such as writing or language translation, it is hard to see how inference scaling can make a big difference, especially if the limitations are due to the training data. For example, if a model works poorly in translating to a low-resource language because it isn’t aware of idiomatic phrases in that language, the model can’t reason its way out of this.

There’s a delightfully spicy section about why it’s a bad idea to defer to the expertise of industry insiders:

In short, the reasons why one might give more weight to insiders’ views aren’t very important. On the other hand, there’s a huge and obvious reason why we should probably give less weight to their views, which is that they have an incentive to say things that are in their commercial interests, and have a track record of doing so.

I also enjoyed this note about how we are still potentially years behind in figuring out how to build usable applications that take full advantage of the capabilities we have today:

The furious debate about whether there is a capability slowdown is ironic, because the link between capability increases and the real-world usefulness of AI is extremely weak. The development of AI-based applications lags far behind the increase of AI capabilities, so even existing AI capabilities remain greatly underutilized. One reason is the capability-reliability gap — even when a certain capability exists, it may not work reliably enough that you can take the human out of the loop and actually automate the task (imagine a food delivery app that only works 80% of the time). And the methods for improving reliability are often application-dependent and distinct from methods for improving capability. That said, reasoning models also seem to exhibit reliability improvements, which is exciting.

Via @randomwalker.bsky.social
Tags: o1, llms, ai, generative-ai, arvind-narayanan

AI Summary and Description: Yes

Short Summary with Insight: The text discusses the current state of AI progress, particularly focusing on the shift from model scaling to inference scaling as a means to enhance AI capabilities, following the limitations seen after GPT-4. It challenges the dependency on industry insiders and emphasizes the gap between AI capabilities and their practical applications, highlighting a critical area of focus for professionals in AI, cloud, and infrastructure security.

Detailed Description:
The essay by Arvind Narayanan and Sayash Kapoor dives into the evolving landscape of AI development, particularly concerning large language models (LLMs). Here are the key points of discussion:

– **Shift from Model Scaling to Inference Scaling**:
– Previously, increasing model capabilities depended largely on adding more data, parameters, and training time.
– Recent trends suggest that continuing this approach is resulting in diminishing returns since the release of GPT-4.
– Inference scaling, which optimizes the compute power used during model execution rather than during training, is becoming the new focus for enhancing AI capabilities.

– **Efficiency and Cost**:
– Inference scaling is increasingly viable as computational efficiency improves, which could yield lower costs per token.
– It is particularly effective for specific tasks, such as coding or solving mathematical problems, where clear correct answers exist.

– **Limitations in Broader Applications**:
– The effectiveness of inference scaling may be limited for complex tasks like language translation and writing, especially when training data lacks depth for certain languages or contexts.
– Issues such as the model’s inability to grasp idiomatic uses in low-resource languages highlight potential shortfalls.

– **Critique of Industry Insiders**:
– The authors critique the tendency to lean towards insights from industry experts who may have biased motivations tied to their commercial interests.
– There is a call for skepticism regarding the value of insiders’ perspectives, driven by their incentives.

– **Capability-Reality Gap**:
– The article notes an ironic situation where the progression of AI capabilities does not parallel the development of real-world applications. This gap suggests that even existing, powerful AI features are underutilized.
– Challenges remain in making AI reliable enough for autonomous tasks, emphasizing the need to bridge the capability-reality gap with an understanding that improving capability does not necessarily enhance reliability.

– **Future Outlook**:
– The discussions point to a potentially long journey ahead in developing usable applications that fully leverage current AI capabilities, suggesting an ongoing area for innovation and research.

This analysis highlights significant implications for security and compliance professionals working with AI solutions, as understanding capability limitations, biases within industry insights, and the core reliability of AI systems is crucial for informed decision-making and strategic planning in AI deployment.