Simon Willison’s Weblog: Is the LLM response wrong, or have you just failed to iterate it?

Source URL: https://simonwillison.net/2025/Sep/7/is-the-llm-response-wrong-or-have-you-just-failed-to-iterate-it/#atom-everything
Source: Simon Willison’s Weblog
Title: Is the LLM response wrong, or have you just failed to iterate it?

Feedly Summary: Is the LLM response wrong, or have you just failed to iterate it?
More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google’s AI mode usually correctly handling a common piece of misinformation but occasionally falling for it (the curse of non-deterministic systems), then shows an example if what he calls a “sorting prompt" as a follow-up:

What is the evidence for and against this being a real photo of Shirley Slade?

The response starts with a non-committal "there is compelling evidence for and against…", then by the end has firmly convinced itself that the photo is indeed a fake. It reads like a fact-checking variant of "think step by step".
Mike neatly describes a problem I’ve also observed recently where "hallucination" is frequently mis-applied as meaning any time a model makes a mistake:

The term hallucination has become nearly worthless in the LLM discourse. It initially described a very weird, mostly non-humanlike behavior where LLMs would make up things out of whole cloth that did not seem to exist as claims referenced any known source material or claims inferable from any known source material. Hallucinations as stuff made up out of nothing. Subsequently people began calling any error or imperfect summary a hallucination, rendering the term worthless.

In this example is the initial incorrect answers were not hallucinations: they correctly summarized online content that contained misinformation. The trick then is to encourage the model to look further, using "sorting prompts" like these:

Facts and misconceptions and hype about what I posted
What is the evidence for and against the claim I posted
Look at the most recent information on this issue, summarize how it shifts the analysis (if at all), and provide link to the latest info

I appreciated this closing footnote:

Should platforms have more features to nudge users to this sort of iteration? Yes. They should. Getting people to iterate investigation rather than argue with LLMs would be a good first step out of this mess that the chatbot model has created.

Via @mikecaulfield.bsky.social
Tags: ai, generative-ai, llms, ai-ethics, ai-assisted-search, hallucinations, digital-literacy

AI Summary and Description: Yes

Summary: The text discusses the concept of “hallucinations” in large language models (LLMs) and critiques the misuse of the term, illustrating the need for better prompts and interaction strategies to enhance the accuracy of AI responses. It highlights a specific technique called “sorting prompts” that helps in clarifying misinformation.

Detailed Description:
The content dives into the complexities of managing misinformation through AI responses, particularly in LLMs. It emphasizes a critical observation regarding the misapplication of the term “hallucination” in the discourse surrounding LLMs. The text advocates for prompting techniques to improve the reliability of information retrieval and analysis by AI systems.

– Key Points:
– **Misuse of the Term “Hallucination”**:
– Initially, the term described a unique behavior where LLMs fabricated responses without any basis in reality.
– Now, it is broadening to include any inaccuracies, diluting its original meaning.
– **Examples of Misinformation**:
– The text describes an instance where an LLM was asked to evaluate the authenticity of an image, demonstrating the model’s problems in distinguishing factual content from misinformation.
– **Sorting Prompts**:
– These are prompts designed to encourage deeper investigation and structured thinking, such as:
– Evaluating evidence for and against claims.
– Summarizing recent information.
– This methodology aims to lead the LLM’s output away from false conclusions.
– **Call for User Support Features**:
– The text closes with a suggestion for platforms to provide features that encourage iterative investigation rather than combative engagement with AI responses.

Overall, this analysis highlights critical implications for AI security and ethics, especially in maintaining the integrity and reliability of information provided by AI systems. For professionals in AI and cloud computing, the insights underline the importance of establishing better frameworks for user interaction and organizing information retrieval processes to combat misinformation effectively.