Source URL: https://simonwillison.net/2025/Oct/4/drew-on-dspy/#atom-everything
Source: Simon Willison’s Weblog
Title: Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines
Feedly Summary: Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines
I’ve had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation I’ve seen yet of the kinds of problems it can help solve.
Drew works on Overture Maps, which combines Point Of Interest data from numerous providers to create a single unified POI database. This is an example of conflation, a notoriously difficult task in GIS where multiple datasets are deduped and merged together.
Drew uses an inexpensive local model, Qwen3-0.6B, to compare 70 million addresses and identity matches, for example between Place(address=”3359 FOOTHILL BLVD", name="RESTAURANT LOS ARCOS") and Place(address="3359 FOOTHILL BLVD", name="Los Arcos Taqueria"’).
DSPy’s role is to optimize the prompt used for that smaller model. Drew used GPT-4.1 and the dspy.MIPROv2 optimizer, producing a 700 token prompt that increased the score from 60.7% to 82%.
Why bother? Drew points out that having a prompt optimization pipeline makes it trivial to evaluate and switch to other models if they can score higher with a custom optimized prompt – without needing that trial-and-error optimization to be executed by had.
Tags: geospatial, gis, ai, prompt-engineering, generative-ai, llms, drew-breunig, overture, dspy
AI Summary and Description: Yes
Summary: The text discusses a presentation by Drew Breunig on DSPy, a tool designed to optimize prompts in AI models, particularly in the context of processing geospatial data. It illustrates the practical application of DSPy in improving the performance of smaller models through prompt engineering, showcasing its relevance in generative AI and LLMs.
Detailed Description: The text highlights significant insights from a presentation at the Databricks Data + AI Summit regarding DSPy, which is used for prompt optimization in AI-driven geospatial data processing. Here are the key points:
– **DSPy Overview**: DSPy is introduced as a tool that enhances prompt optimization for AI models, aimed particularly at improving their efficiency and accuracy in dealing with large datasets.
– **Speaker and Context**: Drew Breunig, who works on the Overture Maps project, focuses on merging Point Of Interest (POI) data from various sources, a common challenge in the Geographic Information Systems (GIS) domain.
– **Example of Conflation**: The talk emphasizes a practical problem in GIS — merging multiple datasets to eliminate duplicates while maintaining data integrity. An example is provided of matching similar addresses with varying names.
– **Application of Models**: Breunig employs a local model (Qwen3-0.6B) and discusses its capacity to handle 70 million address comparisons effectively.
– **Role of DSPy in Optimization**: DSPy optimizes prompts for the AI model, where Breunig illustrates the process of transitioning from a 700-token prompt using DSPy’s dspy.MIPROv2 optimizer. This improved the accuracy score from 60.7% to 82%.
– **Significance of Prompt Optimization**: The presentation underscores the importance of having a robust prompt optimization strategy, which allows for easy evaluation and switching between AI models. This reduces the need for time-consuming trial-and-error processes traditionally associated with model evaluation.
– **Tags Indicating Trends**: The use of terms like “geospatial,” “gis,” “ai,” “prompt-engineering,” and “generative-ai” suggests that this discussion is pertinent to both AI and generative AI security, particularly regarding how models can be made more reliable through effective prompt use.
Overall, this content is relevant for professionals in AI and machine learning, especially those focused on improving model efficiency and accuracy through innovative prompt engineering techniques.