Source URL: https://simonwillison.net/2025/Aug/11/ai-for-data-engineers/#atom-everything
Source: Simon Willison’s Weblog
Title: AI for data engineers with Simon Willison
Feedly Summary: AI for data engineers with Simon Willison
I recorded an episode last week with Claire Giordano for the Talking Postgres podcast. The topic was “AI for data engineers" but we ended up covering an enjoyable range of different topics.
How I got started programming with a Commodore 64 – the tape drive for which inspired the name Datasette
Selfish motivations for TILs (force me to write up my notes) and open source (help me never have to solve the same problem twice)
LLMs have been good at SQL for a couple of years now. Here’s how I used them for a complex PostgreSQL query that extracted alt text from my blog’s images using regular expressions
Structured data extraction as the most economically valuable application of LLMs for data work
2025 has been the year of tool calling a loop ("agentic" if you like)
Thoughts on running MCPs securely – read-only database access, think about sandboxes, use PostgreSQL permissions, watch out for the lethal trifecta
Jargon guide: Agents, MCP, RAG, Tokens
How to get started learning to prompt: play with the models and "bring AI to the table" even for tasks that you don’t think it can handle
"It’s always a good day if you see a pelican"
Tags: postgresql, ai, generative-ai, llms, podcast-appearances
AI Summary and Description: Yes
Summary: The text discusses various aspects of using AI in data engineering, focusing on its application with SQL, the use of large language models (LLMs), and secure practices in running machine learning processes. It highlights how LLMs can effectively assist in data extraction tasks, emphasizing both practical applications and security considerations in handling data.
Detailed Description: The content centers around the intersection of AI and data engineering, particularly the use of LLMs for querying and data extraction with PostgreSQL. Here are the major points elaborated upon:
– **Introduction to the Podcast Episode**: The speaker reflected on their experiences merging traditional data engineering with AI technologies, particularly in the context of a podcast episode titled “AI for data engineers.”
– **Programming Journey**: A brief history of the speaker’s introduction to programming, referencing a Commodore 64 and its impact on their development experience.
– **Use of LLMs for SQL Queries**:
– LLMs (Large Language Models) have been effective in generating and understanding SQL for tasks over the past couple of years.
– The speaker provided an example of using LLMs to extract alt text from images in their blog through a complex PostgreSQL query utilizing regular expressions.
– **Valuable Applications of LLMs**:
– **Structured Data Extraction**: The text identifies structured data extraction as a significant use case for LLMs within data engineering, highlighting its economic significance.
– **Secure Practices for Machine Learning**:
– The discussion touches on important security practices when running machine learning processes (MCPs) and databases:
– Emphasis on read-only database access to prevent unauthorized data changes.
– Utilization of sandboxes to ensure a secure testing environment.
– Attention to permissions within PostgreSQL to maintain security while allowing necessary access.
– **Jargon and Terminology**: A brief mention of jargon related to AI and data engineering, including terms like Agents, MCP (Machine Learning Controlled Processes), RAG (Retrieval-Augmented Generation), and Tokens.
– **Learning and Engagement**: Advice on learning to prompt LLMs, encouraging experimentation and interaction with AI tools even for less conventional tasks.
– **Closing Thoughts**: A light-hearted conclusion with a personal touch, indicating the speaker’s positive outlook on daily experiences.
Overall, the discussion merges technological insights with practical advice for data engineers looking to leverage AI securely and effectively, making it relevant for professionals in the fields of AI security, information security, and data management.