Simon Willison’s Weblog: Quoting Jason Liu

Sep 6, 2025

—

Source URL: https://simonwillison.net/2025/Sep/6/jason-liu/#atom-everything
Source: Simon Willison’s Weblog
Title: Quoting Jason Liu

Feedly Summary: I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of a visual language model, than using CLIP embeddings themselves. If you tell the LLM that the summary is going to be embedded and used to do search downstream. I had one system go from 28% recall at 5 using CLIP to 75% recall at 5 using an LLM summary.
— Jason Liu
Tags: vision-llms, generative-ai, ai, embeddings, llms, jason-liu

AI Summary and Description: Yes

Summary: The text discusses the significant improvement in image retrieval performance achieved by embedding highly opinionated summaries generated by a visual language model (LLM), outperforming traditional CLIP embeddings. This finding emphasizes the potential of leveraging LLMs in enhancing search capabilities within AI systems.

Detailed Description: The text highlights a practical application of LLMs in improving the effectiveness of image retrieval systems. Key points include:

– **Performance Comparison**: The author notes a drastic increase in recall rates when switching from CLIP embeddings to LLM-generated summaries. Specifically, one system demonstrated a recall improvement from 28% to 75% at a specified retrieval rank (5).

– **Opinionated Summaries**: The use of “highly opinionated summaries” suggests that the summaries generated by the LLM are not just neutral descriptions but rather informed insights that may focus on key aspects of the image, which likely directs the search process more effectively.

– **Implications for AI/Generative AI**: This finding is particularly relevant for professionals working with generative AI, as it identifies a new technique for enhancing search capabilities—highlighting a transition from standard embedding methods to more sophisticated, LLM-driven approaches.

– **Search Improvement Strategy**: By informing the LLM that the summary will be used for downstream search, practitioners can optimize the summary generation to better meet retrieval objectives.

This insight has practical implications for those involved in AI, particularly in enhancing the efficacy of machine learning systems by leveraging advanced language models. As retrieval accuracy is vital in several applications, this approach could lead to more sophisticated and effective AI solutions in fields like computer vision and data retrieval.

.NET 2 2025 5 7 a accuracy Act advanced advanced language models age AGI AI AI systems All and app Application applications Arch art as at ated Bi by C capabilities CI CLIP co compute computer computer vision D data data retrieval de demo drive driven driven approach e effective effectiveness embeddings EU for g Gen generated generated summaries generation generative Generative AI Go gs H high Highlight http HTTPS image implications improving in insights io J Just k Key l language language model language models learning led Li llm llms lm M mac machine Machine Learning machine learning systems man Mode model models N neutral new no notes o of on one ons opt oS out Paris per performance performance comparison phi point potential practical application practical implications pro process professionals ps Q R rag Rank rate RCE re recall recall rates retrieval retrieval accuracy Ro s search search capabilities Sig Sim Simon Willison size sizes solutions source specific SSE Strategy switching system systems T Tags: tech ted text the to TP transition trie US use uth V val Vision vision-llms web Wi x yt z