Simon Willison’s Weblog: Quoting Jason Liu

Source URL: https://simonwillison.net/2025/Sep/6/jason-liu/#atom-everything
Source: Simon Willison’s Weblog
Title: Quoting Jason Liu

Feedly Summary: I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of a visual language model, than using CLIP embeddings themselves. If you tell the LLM that the summary is going to be embedded and used to do search downstream. I had one system go from 28% recall at 5 using CLIP to 75% recall at 5 using an LLM summary.
— Jason Liu
Tags: vision-llms, generative-ai, ai, embeddings, llms, jason-liu

AI Summary and Description: Yes

Summary: The text discusses the significant improvement in image retrieval performance achieved by embedding highly opinionated summaries generated by a visual language model (LLM), outperforming traditional CLIP embeddings. This finding emphasizes the potential of leveraging LLMs in enhancing search capabilities within AI systems.

Detailed Description: The text highlights a practical application of LLMs in improving the effectiveness of image retrieval systems. Key points include:

– **Performance Comparison**: The author notes a drastic increase in recall rates when switching from CLIP embeddings to LLM-generated summaries. Specifically, one system demonstrated a recall improvement from 28% to 75% at a specified retrieval rank (5).

– **Opinionated Summaries**: The use of “highly opinionated summaries” suggests that the summaries generated by the LLM are not just neutral descriptions but rather informed insights that may focus on key aspects of the image, which likely directs the search process more effectively.

– **Implications for AI/Generative AI**: This finding is particularly relevant for professionals working with generative AI, as it identifies a new technique for enhancing search capabilities—highlighting a transition from standard embedding methods to more sophisticated, LLM-driven approaches.

– **Search Improvement Strategy**: By informing the LLM that the summary will be used for downstream search, practitioners can optimize the summary generation to better meet retrieval objectives.

This insight has practical implications for those involved in AI, particularly in enhancing the efficacy of machine learning systems by leveraging advanced language models. As retrieval accuracy is vital in several applications, this approach could lead to more sophisticated and effective AI solutions in fields like computer vision and data retrieval.