Simon Willison’s Weblog: Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

Source URL: https://simonwillison.net/2024/Sep/6/weeknotes/
Source: Simon Willison’s Weblog
Title: Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

Feedly Summary: I’ve been having a bunch of fun taking advantage of CORS-enabled LLM APIs to build client-side JavaScript applications that access LLMs directly. I also span up a new Datasette plugin for advanced permission management.

LLMs from client-side JavaScript
Converting PDFs to HTML and Markdown
Adding some class to Datasette forms
On the blog
Releases
TILs

LLMs from client-side JavaScript
Anthropic recently added CORS support to their Claude APIs. It’s a little hard to use – you have to add anthropic-dangerous-direct-browser-access: true to your request headers to enable it – but once you know the trick you can start building web applications that talk to Anthropic’s LLMs directly, without any additional server-side code.
I later found out that both OpenAI and Google Gemini have this capability too, without needing the special header.
The problem with this approach is security: it’s very important not to embed an API key attached to your billing account in client-side HTML and JavaScript for anyone to see!
For my purposes though that doesn’t matter. I’ve been building tools which prompt() a user for their own API key (sadly restricting their usage to the tiny portion of people who both understand API keys and have created API accounts with one of the big providers) – then I stash that key in localStorage and start using it to make requests.
My simonw/tools repository is home to a growing collection of pure HTML+JavaScript tools, hosted at tools.simonwillison.net using GitHub Pages. I love not having to even think about hosting server-side code for these tools.
I’ve published three tools there that talk to LLMs directly so far:

haiku is a fun demo that requests access to the user’s camera and then writes a Haiku about what it sees. It uses Anthropic’s Claude 3 Haiku model for this – the whole project is one terrible pun. Haiku source code here.

gemini-bbox uses the Gemini 1.5 Pro (or Flash) API to prompt those models to return bounding boxes for objects in an image, then renders those bounding boxes. Gemini Pro is the only of the vision LLMs that I’ve tried that has reliable support for bounding boxes. I wrote about this in Building a tool showing how Gemini Pro can return bounding boxes for objects in images.

Gemini Chat App is a more traditional LLM chat interface that again talks to Gemini models (including the new super-speedy gemini-1.5-flash-8b-exp-0827). I built this partly to try out those new models and partly to experiment with implementing a streaming chat interface agaist the Gemini API directly in a browser. I wrote more about how that works in this post.

Here’s that Gemini Bounding Box visualization tool:

All three of these tools made heavy use of AI-assisted development: Claude 3.5 Sonnet wrote almost every line of the last two, and the Haiku one was put together a few months ago using Claude 3 Opus.
My personal style of HTML and JavaScript apps turns out to be highly compatible with LLMs: I like using vanilla HTML and JavaScript and keeping everything in the same file, which makes it easy to paste the entire thing into the model and ask it to make some changes for me. This approach also works really well with Claude Artifacts, though I have to tell it “no React" to make sure I get an artifact I can hack on without needing to configure a React build step.
Converting PDFs to HTML and Markdown
I have a long standing vendetta against PDFs for sharing information. They’re painful to read on a mobile phone, they have poor accessibility, and even things like copying and pasting text from them can be a pain.
Complaining without doing something about it isn’t really my style. Twice in the past few weeks I’ve taken matters into my own hands:

Google Research released a PDF paper describing their new pipe syntax for SQL. I ran it through Gemini 1.5 Pro to convert it to HTML (prompts here) and got this – a pretty great initial result for the first prompt I tried!
Nous Research released a preliminary report PDF about their DisTro technology for distributed training of LLMs over low-bandwidth connections. I ran a prompt to use Gemini 1.5 Pro to convert that to this Markdown version, which even handled tables.

Within six hours of posting it my Pipe Syntax in SQL conversion was ranked third on Google for the title of the paper, at which point I set it to to try and keep the unverified clone out of search. Yet more evidence that HTML is better than PDF!
I’ve spent less than a total of ten minutes on using Gemini to convert PDFs in this way and the results have been very impressive. If I were to spend more time on this I’d target figures: I have a hunch that getting Gemini to return bounding boxes for figures on the PDF pages could be the key here, since then each figure could be automatically extracted as an image.
I bet you could build that whole thing as a client-side app against the Gemini Pro API, too…
Adding some class to Datasette forms
I’ve been working on a new Datasette plugin for permissions management, datasette-acl, which I’ll write about separately soon.
I wanted to integrate Choices.js with it, to provide a nicer interface for adding permissions to a user or group.
My first attempt at integrating Choices ended up looking like this:

The weird visual glitches are caused by Datasette’s core CSS, which included the following rule:
form input[type=submit], form button[type=button] {
font-weight: 400;
cursor: pointer;
text-align: center;
vertical-align: middle;
border-width: 1px;
border-style: solid;
padding: .5em 0.8em;
font-size: 0.9rem;
line-height: 1;
border-radius: .25rem;
}
These style rules apply to any submit button or button-button that occurs inside a form!
I’m glad I caught this before Datasette 1.0. I’ve now started the process of fixing that, by ensuring these rules only apply to elements with class="core" (or that class on a wrapping element). This ensures plugins can style these elements without being caught out by Datasette’s defaults.
The problem is… there are a whole bunch of existing plugins that currently rely on that behaviour. I have a tricking issue about that, which identified 28 plugins that need updating. I’ve worked my way through 8 of those so far, hence the flurry of releases listed at the bottom of this post.
This is also an excuse to revisit a bunch of older plugins, some of which had partially complete features that I’ve been finishing up.
datasette-write for example now has a neat row action menu item for updating a selected row using a pre-canned UPDATE query. Here’s an animated demo of my first prototype of that feature:

On the blog
anthropic

Claude’s API now supports CORS requests, enabling client-side applications – 2024-08-23

Explain ACLs by showing me a SQLite table schema for implementing them – 2024-08-23

Musing about OAuth and LLMs on Mastodon – 2024-08-24

Building a tool showing how Gemini Pro can return bounding boxes for objects in images – 2024-08-26

Long context prompting tips – 2024-08-26

Anthropic Release Notes: System Prompts – 2024-08-26

Alex Albert: We’ve read and heard that you’d appreciate more t… – 2024-08-26

Gemini Chat App – 2024-08-27

System prompt for val.town/townie – 2024-08-28

How Anthropic built Artifacts – 2024-08-28

Anthropic’s Prompt Engineering Interactive Tutorial – 2024-08-30

llm-claude-3 0.4.1 – 2024-08-30

ai-assisted-programming

Andy Jassy, Amazon CEO: […] here’s what we found when we integrated [Am… – 2024-08-24

AI-powered Git Commit Function – 2024-08-26

OpenAI: Improve file search result relevance with chunk ranking – 2024-08-30

Forrest Brazeal: I think that AI has killed, or is about to kill, … – 2024-08-31

gemini

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL – 2024-08-24

NousResearch/DisTrO – 2024-08-27

python

uvtrick – 2024-09-01

Anatomy of a Textual User Interface – 2024-09-02

Why I Still Use Python Virtual Environments in Docker – 2024-09-02

Python Developers Survey 2023 Results – 2024-09-03

security

Top companies ground Microsoft Copilot over data governance concerns – 2024-08-23

Frederik Braun: In 2021 we [the Mozilla engineering team] found “… – 2024-08-26

OAuth from First Principles – 2024-09-05

projects

My @covidsewage bot now includes useful alt text – 2024-08-25

armin-ronacher

MiniJinja: Learnings from Building a Template Engine in Rust – 2024-08-27

ethics

John Gruber: Everyone alive today has grown up in a world wher… – 2024-08-27

open-source

Debate over “open source AI” term brings new push to formalize definition – 2024-08-27

Elasticsearch is open source, again – 2024-08-29

performance

Cerebras Inference: AI at Instant Speed – 2024-08-28

sqlite

D. Richard Hipp: My goal is to keep SQLite relevant and viable thr… – 2024-08-28

aws

Leader Election With S3 Conditional Writes – 2024-08-30

javascript

Andreas Giammarchi: whenever you do this: `el.innerHTML += HTML` … – 2024-08-31

openai

OpenAI says ChatGPT usage has doubled since last year – 2024-08-31

art

Ted Chiang: Art is notoriously hard to define, and so are the… – 2024-08-31

llm

anjor: `history | tail -n 2000 | llm -s "Write aliases f… – 2024-09-03

vision-llms

Qwen2-VL: To See the World More Clearly – 2024-09-04

Releases

datasette-import 0.1a5 – 2024-09-04Tools for importing data into Datasette

datasette-search-all 1.1.3 – 2024-09-04Datasette plugin for searching all searchable tables at once

datasette-write 0.4 – 2024-09-04Datasette plugin providing a UI for executing SQL writes against the database

datasette-debug-events 0.1a0 – 2024-09-03Print Datasette events to standard error

datasette-auth-passwords 1.1.1 – 2024-09-03Datasette plugin for authentication using passwords

datasette-enrichments 0.4.3 – 2024-09-03Tools for running enrichments against data stored in Datasette

datasette-configure-fts 1.1.4 – 2024-09-03Datasette plugin for enabling full-text search against selected table columns

datasette-auth-tokens 0.4a10 – 2024-09-03Datasette plugin for authenticating access using API tokens

datasette-edit-schema 0.8a3 – 2024-09-03Datasette plugin for modifying table schemas

datasette-pins 0.1a4 – 2024-09-01Pin databases, tables, and other items to the Datasette homepage

datasette-acl 0.4a2 – 2024-09-01Advanced permission management for Datasette

llm-claude-3 0.4.1 – 2024-08-30LLM plugin for interacting with the Claude 3 family of models

TILs

Testing HTML tables with Playwright Python – 2024-09-04

Using namedtuple for pytest parameterized tests – 2024-08-31

Tags: css, javascript, pdf, projects, ai, datasette, weeknotes, generative-ai, llms, anthropic, claude, gemini, claude-3-5-sonnet, cors

AI Summary and Description: Yes

**Summary:** The text discusses the innovative use of CORS-enabled LLM APIs in client-side JavaScript applications, highlighting security concerns related to API key management. It showcases specific tools built using LLMs, such as the Gemini Chat App, and emphasizes the ongoing development of a Datasette plugin for managing permissions more effectively. This content is particularly relevant for professionals in AI security and cloud computing, as it addresses both practical application development and inherent security challenges.

**Detailed Description:**
The text elaborates on multiple aspects of leveraging LLMs through client-side applications while addressing significant security implications regarding API key exposure. Here are the major points:

– **Client-Side Applications with LLMs:**
– The author exploits CORS (Cross-Origin Resource Sharing) support to access LLMs directly from client-side JavaScript, simplifying web application development without needing server-side code.
– Specifically mentions Anthropic’s Claude APIs and notes that OpenAI and Google Gemini also support similar capabilities without extra headers, facilitating easier access to LLMs.

– **Security Considerations:**
– Stresses the risk of embedding sensitive API keys in client-side code, which can be viewed easily by anyone inspecting the webpage.
– Proposes a workaround by prompting users for their API key, which is then stored in localStorage, limiting exposure but also narrowing the user base to those familiar with API usage.

– **Developed Tools:**
– The author has created several tools utilizing these APIs:
– **Haiku:** A demo that accesses the user’s camera to generate haikus about what it sees.
– **Gemini-bbox:** A tool that prompts the Gemini API to return bounding boxes for objects in images, leveraging LLM capabilities in image processing.
– **Gemini Chat App:** A chat interface implemented directly in the browser, exploring new model functionalities and streaming capabilities.

– **Conversion of PDFs:**
– The author expresses frustration with PDFs and demonstrates the potential for improved accessibility and usability by converting them to HTML and Markdown using LLMs.
– Highlights successful conversions of research papers into more user-friendly formats, showcasing the practical utility of LLMs in handling text-heavy formats.

– **Datasette Plugin Development:**
– Mentions working on a permissions management plugin for Datasette, integrating it with Choices.js for a better user experience.
– Identifies CSS issues caused by existing rules and works on solutions to avoid conflicts with community plugins, demonstrating ongoing contributions to the open-source ecosystem.

Overall, this text serves as both a practical guide and a warning for developers working with LLMs in client-side environments, making it vital for professionals in AI security, software development, and privacy to consider the implications of their implementations.