Simon Willison’s Weblog: W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October

Source URL: https://simonwillison.net/2024/Oct/30/monthnotes/#atom-everything
Source: Simon Willison’s Weblog
Title: W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October

Feedly Summary: I try to publish weeknotes at least once every two weeks. It’s been four since the last entry, so I guess this one counts as monthnotes instead.
In my defense, the reason I’ve fallen behind on weeknotes is that I’ve been publishing a lot of long-form blog entries this month.
Plentiful LLM vendor news
A lot of LLM stuff happened. OpenAI had their DevDay, which I used as an opportunity to try out live blogging for the first time. I figured out video scraping with Google Gemini and generally got excited about how incredibly inexpensive the Gemini models are. Anthropic launched Computer Use and JavaScript analysis, and the month ended with GitHub Universe.
My LLM tool goes multi-modal
My big achievement of the month was finally shipping multi-modal support for my LLM tool. This has been almost a year in the making: GPT-4 vision kicked off the new era of vision LLMs at OpenAI DevDay last November and I’ve been watching the space with keen interest ever since.
I had a couple of false starts at the feature, which was difficult at first because LLM acts as a cross-model abstraction layer, and it’s hard to design those effectively without plenty of examples of different models.
Initially I thought the feature would just be for images, but then Google Gemini launched the ability to feed in PDFs, audio files and videos as well. That’s why I renamed it from -i/–image to -a/–attachment – I’m glad I hadn’t committed to the image UI before realizing that file attachments could be so much more.
I’m really happy with how the feature turned out. The one missing piece at the moment is local models: I prototyped some incomplete local model plugins to verify the API design would work, but I’ve not yet pushed any of them to a state where I think they’re ready to release. My research into mistral.rs was part of that process.
Now that attachments have landed I’m free to start thinking about the next major LLM feature. I’m leaning towards tool usage: enough models have tool use / structured output capabilities now that I think I can design an abstraction layer that works across all of them. The combination of tool use with LLM’s plugin system is really fun to think about.
Blog entries

You can now run prompts against images, audio and video in your terminal using LLM
Run a prompt to generate and execute jq programs using llm-jq
Notes on the new Claude analysis JavaScript code execution tool
Initial explorations of Anthropic’s new Computer Use capability
Everything I built with Claude Artifacts this week
Running Llama 3.2 Vision and Phi-3.5 Vision on a Mac with mistral.rs
Experimenting with audio input and output for the OpenAI Chat Completion API
Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent
ChatGPT will happily write you a thinly disguised horoscope
OpenAI DevDay: Let’s build developer tools, not digital God
OpenAI DevDay 2024 live blog

Releases

llm-mistral 0.7 – 2024-10-29LLM plugin providing access to Mistral models using the Mistral API

llm-claude-3 0.6 – 2024-10-29LLM plugin for interacting with the Claude 3 family of models

llm-gemini 0.3 – 2024-10-29LLM plugin to access Google’s Gemini family of models

llm 0.17 – 2024-10-29Access large language models from the command-line

llm-whisper-api 0.1.1 – 2024-10-27Run transcriptions using the OpenAI Whisper API

llm-jq 0.1.1 – 2024-10-27Write and execute jq programs with the help of LLM

claude-to-sqlite 0.2 – 2024-10-21Convert a Claude.ai export to SQLite

files-to-prompt 0.4 – 2024-10-16Concatenate a directory full of files into a single prompt for use with LLMs

datasette-examples 0.1a0 – 2024-10-08Load example SQL scripts into Datasette on startup

datasette 0.65 – 2024-10-07An open source multi-tool for exploring and publishing data

TILs

Installing flash-attn without compiling it – 2024-10-25

Using uv to develop Python command-line applications – 2024-10-24

Setting cache-control: max-age=31536000 with a Cloudflare Transform Rule – 2024-10-24

Running prompts against images, PDFs, audio and video with Google Gemini – 2024-10-23

The most basic possible Hugo site – 2024-10-23

Livestreaming a community election event on YouTube – 2024-10-10

Upgrading Homebrew and avoiding the failed to verify attestation error – 2024-10-09

Collecting replies to tweets using JavaScript – 2024-10-09

Compiling and running sqlite3-rsync – 2024-10-04

Building an automatically updating live blog in Django – 2024-10-02

Tags: weeknotes, llms, llm

AI Summary and Description: Yes

Summary: The text provides insights into recent developments in the large language model (LLM) landscape, highlighting advancements in multi-modal support and new capabilities introduced by major LLM vendors like OpenAI and Anthropic. This information is particularly relevant for professionals in AI security and cloud computing, as it touches on the emerging functionalities of LLM technologies that could impact data security and infrastructure design.

Detailed Description:

The content outlines several key developments in large language models (LLMs) over the last month, emphasizing new features, launches, and personal achievements related to LLM tooling. Here’s a breakdown of the major points:

– **Publication Frequency**: The author mentions a delay in blogging, which is common in tech circles where constant innovation occurs.

– **Event Highlights**:
– OpenAI held a significant event, “DevDay,” where new tools and technologies were showcased, such as advances in LLMs.
– Notable product launches from Anthropic, including new capabilities in JavaScript analysis and a tool for computer usage.
– GitHub Universe showcased additional developments in the LLM space.

– **LLM Tool Development**: The author successfully launched multi-modal support for their LLM tool:
– Originally envisioned to operate solely with images, the tool was upgraded to handle various media types, including PDFs, audio, and video, thanks to advancements in underlying models.
– This multi-modal capability opens up new avenues for integrating diverse data types into AI models, enhancing their functionality and applicability.

– **Research and Development**: The author mentions exploring local models and the challenges involved in designing a cross-model abstraction layer that effectively connects various AI models.

– **Future Directions**: There’s a strong inclination towards integrating tool usage within the LLM architecture, leveraging the emerging structured output capabilities across multiple models.

– **Community Engagement**: The author actively experiments with new features and encourages community engagement by sharing blog entries, which provide real-world applications and learning experiences concerning LLM functionalities.

**Bullet Points of Major Highlights:**
– OpenAI’s DevDay introduced significant updates to LLM functionalities.
– Anthropic launched new tools enhancing their LLM capabilities.
– Successful development of multi-modal support for images, audio, and videos.
– Future plans to design a structured interaction framework for various AI tools.
– Blog entries document developments, facilitating knowledge sharing within the community.

Overall, this text is a rich resource for professionals interested in the evolving landscape of AI and LLM technologies, particularly concerning security, compliance, and infrastructure considerations. As LLMs continue to incorporate diverse inputs and outputs, understanding their capabilities and potential security implications will be paramount for tech governance and compliance frameworks.