Simon Willison’s Weblog: An LLM Query Understanding Service

Apr 9, 2025

—

Source URL: https://simonwillison.net/2025/Apr/9/an-llm-query-understanding-service/#atom-everything
Source: Simon Willison’s Weblog
Title: An LLM Query Understanding Service

Feedly Summary: An LLM Query Understanding Service
Doug Turnbull recently wrote about how all search is structured now:

Many times, even a small open source LLM will be able to turn a search query into reasonable structure at relatively low cost.

In this follow-up tutorial he demonstrates Qwen 2-7B running in a GPU-enabled Google Kubernetes Engine container to turn user search queries like “red loveseat" into structured filters like {"item_type": "loveseat", "color": "red"}.
Here’s the prompt he uses.
Respond with a single line of JSON:

{"item_type": "sofa", "material": "wood", "color": "red"}

Omit any other information. Do not include any
other text in your response. Omit a value if the
user did not specify it. For example, if the user
said "red sofa", you would respond with:

{"item_type": "sofa", "color": "red"}

Here is the search query: blue armchair

Out of curiosity, I tried running his prompt against some other models using LLM:

gemini-1.5-flash-8b, the cheapest of the Gemini models, handled it well and cost $0.000011 – or 0.0011 cents.
llama3.2:3b worked too – that’s a very small 2GB model which I ran using Ollama.
deepseek-r1:1.5b – a tiny 1.1GB model, again via Ollama, amusingly failed by interpreting "red loveseat" as {"item_type": "sofa", "material": null, "color": "red"} after thinking very hard about the problem!

Via lobste.rs
Tags: prompt-engineering, llm, generative-ai, search, ai, llms

AI Summary and Description: Yes

Summary: The text discusses the application of a small open-source LLM (Large Language Model) for turning search queries into structured data formats. It highlights the affordability of using different models in a cloud environment, specifically within a Google Kubernetes Engine. This insight is valuable for security and compliance professionals looking at the implications and effectiveness of AI in information retrieval and data structuring.

Detailed Description:
The text elaborates on how advancements in LLM technologies are enabling more efficient data structuring from search queries, making it relevant for multiple categories like Generative AI, LLM Security, and Cloud Computing.

Key Points:
– **Query Structuring**: A small open-source LLM can effectively understand and structure search queries, presenting potential efficiencies in data handling.
– **Practical Implementation**: The service utilizes Qwen 2-7B running on Google Kubernetes Engine to convert search queries into structured JSON formats, showcasing practical AI applications.
– **Model Performance Comparison**: The text provides insights into the performance and cost-effectiveness of different AI models (Gemini 1.5 at $0.000011, llama3.2, and deepseek-r1) in processing search queries.
– **Cost Efficiency**: The reference to cost highlights the accessibility of sophisticated AI tools for even small-scale users or organizations.

Implications for Security and Compliance Professionals:
– **Data Handling Security**: Understanding how AI structures data can inform compliance with data protection regulations.
– **Model Selection**: Knowledge of model performance and costs allows for better budgeting and resource allocation within the cloud environment.
– **Innovative Processes**: These advancements may necessitate new security protocols or compliance measures as organizations increasingly adopt AI-driven solutions for data structuring and retrieval.

The discussion encapsulates the growing relevance of AI capabilities in everyday search queries and the resulting implications for cloud computing and AI security.

.NET 01 1 2 2025 3 5 7 a access accessibility Act advancement advancements affordability after AI AI applications ai model AI models AI security AI tool AI tools air and app Application applications Arch ARM as budgeting by C capabilities CI Cloud cloud computing cloud environment co Col compliance compliance measures compliance professionals Computing container cost cost efficiency cost-effective cost-effectiveness Costs D data data formats Data Handling Data Protection data protection regulations data structuring day de deep DeepSeek demo drive driven driven solutions e edge effective effectiveness efficiency efficient election Engineer engineering environment ERP fail for g Gemini Gemini 1.5 Gemini model Gemini models Gen generative Generative AI Go Google Google Kubernetes Google Kubernetes Engine GPU gs H heap high Highlight http HTTPS implementation implications in information information retrieval insights inter interpret iOS Iron ite J json k Key knowledge Kubernetes Kubernetes Engine l Labor language language model large large language model led Li llama llm llms lm low making man measures mini Mode model model performance model selection models multi N NCA no o of ollama on open open-source opt organization organizations out performance performance comparison phi point potential pre problem process processes processing professionals prompt prompt-engineering protection protocol protocols Q queries Qwen R R1 rate RCE red Regulation regulations resource resource allocation response retrieval Ro RoT s Scale search sec security security and compliance security protocols service Sig Sim single solutions source specific SSE structured structured data structures T Tags: tech technologies text the Time to tool tools Tor TP trie turn type Uber under up US use user Users V val web Well Wi x