Simon Willison’s Weblog: Is the LLM response wrong, or have you just failed to iterate it?

Sep 7, 2025

—

Source URL: https://simonwillison.net/2025/Sep/7/is-the-llm-response-wrong-or-have-you-just-failed-to-iterate-it/#atom-everything
Source: Simon Willison’s Weblog
Title: Is the LLM response wrong, or have you just failed to iterate it?

Feedly Summary: Is the LLM response wrong, or have you just failed to iterate it?
More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google’s AI mode usually correctly handling a common piece of misinformation but occasionally falling for it (the curse of non-deterministic systems), then shows an example if what he calls a “sorting prompt" as a follow-up:

What is the evidence for and against this being a real photo of Shirley Slade?

The response starts with a non-committal "there is compelling evidence for and against…", then by the end has firmly convinced itself that the photo is indeed a fake. It reads like a fact-checking variant of "think step by step".
Mike neatly describes a problem I’ve also observed recently where "hallucination" is frequently mis-applied as meaning any time a model makes a mistake:

The term hallucination has become nearly worthless in the LLM discourse. It initially described a very weird, mostly non-humanlike behavior where LLMs would make up things out of whole cloth that did not seem to exist as claims referenced any known source material or claims inferable from any known source material. Hallucinations as stuff made up out of nothing. Subsequently people began calling any error or imperfect summary a hallucination, rendering the term worthless.

In this example is the initial incorrect answers were not hallucinations: they correctly summarized online content that contained misinformation. The trick then is to encourage the model to look further, using "sorting prompts" like these:

Facts and misconceptions and hype about what I posted
What is the evidence for and against the claim I posted
Look at the most recent information on this issue, summarize how it shifts the analysis (if at all), and provide link to the latest info

I appreciated this closing footnote:

Should platforms have more features to nudge users to this sort of iteration? Yes. They should. Getting people to iterate investigation rather than argue with LLMs would be a good first step out of this mess that the chatbot model has created.

Via @mikecaulfield.bsky.social
Tags: ai, generative-ai, llms, ai-ethics, ai-assisted-search, hallucinations, digital-literacy

AI Summary and Description: Yes

Summary: The text discusses the concept of “hallucinations” in large language models (LLMs) and critiques the misuse of the term, illustrating the need for better prompts and interaction strategies to enhance the accuracy of AI responses. It highlights a specific technique called “sorting prompts” that helps in clarifying misinformation.

Detailed Description:
The content dives into the complexities of managing misinformation through AI responses, particularly in LLMs. It emphasizes a critical observation regarding the misapplication of the term “hallucination” in the discourse surrounding LLMs. The text advocates for prompting techniques to improve the reliability of information retrieval and analysis by AI systems.

– Key Points:
– **Misuse of the Term “Hallucination”**:
– Initially, the term described a unique behavior where LLMs fabricated responses without any basis in reality.
– Now, it is broadening to include any inaccuracies, diluting its original meaning.
– **Examples of Misinformation**:
– The text describes an instance where an LLM was asked to evaluate the authenticity of an image, demonstrating the model’s problems in distinguishing factual content from misinformation.
– **Sorting Prompts**:
– These are prompts designed to encourage deeper investigation and structured thinking, such as:
– Evaluating evidence for and against claims.
– Summarizing recent information.
– This methodology aims to lead the LLM’s output away from false conclusions.
– **Call for User Support Features**:
– The text closes with a suggestion for platforms to provide features that encourage iterative investigation rather than combative engagement with AI responses.

Overall, this analysis highlights critical implications for AI security and ethics, especially in maintaining the integrity and reliability of information provided by AI systems. For professionals in AI and cloud computing, the insights underline the importance of establishing better frameworks for user interaction and organizing information retrieval processes to combat misinformation effectively.

.NET 2 2025 5 7 a accuracy Act ads age AGI AI AI security AI systems All analysis and app Application Arch Aria Arize art as assisted at ated authenticity Behavior being Bi bot by C calling Casio chat Chatbot checking CI CIA Cloud cloud computing co commit Computing concept content critical D de deep demo design deterministic digital e effective ELF end engagement error Ethics fabricated responses fact fail feature features first for framework frameworks g Gen generative git Go Google gs H hallucination hallucinations handling high Highlight HR http HTTPS human image implications in inaccuracies information information retrieval insights Instance integrity inter interaction investigation io IRS issue ite iteration J Just k Key l language language model language models large large language model large language models Large Language Models (LLMs) led Li liability line Link Lite literacy llm llms lm low M made man mean mini misconceptions misinformation misuse ML Mode model models N nation NIST no non nothing o of on online content ons oS out output over per platform platforms point post pre pro problem process processes professionals prompt Prompting prompting techniques prompts ps Q R rag rate RCE re real reality red reliability rendering response responses retrieval Ro s SAP search sec security self shift Sig Sim Simon Willison size sizes SoC social sorting source specific SSE STAR start STIG strategies structured support SUSE system systems T Tags: tech techniques ted test text the thinking Time to TP trie UI under up US use user user interaction user support Users uth V val web Wi x yt z