Simon Willison’s Weblog: llm-pdf-to-images

Source URL: https://simonwillison.net/2025/May/18/llm-pdf-to-images/#atom-everything
Source: Simon Willison’s Weblog
Title: llm-pdf-to-images

Feedly Summary: llm-pdf-to-images
Inspired by my previous llm-video-frames plugin, I thought it would be neat to have a plugin for LLM that can take a PDF and turn that into an image-per-page so you can feed PDFs into models that support image inputs but don’t yet support PDFs.
This should now do exactly that:
llm install llm-pdf-to-images
llm -f pdf-to-images:path/to/document.pdf ‘Summarize this document’
Under the hood it’s using the PyMuPDF library. The key code to convert a PDF into images looks like this:
import fitz
doc = fitz.open(“input.pdf")
for page in doc:
pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)
Once I’d figured out that code I got o4-mini to write most of the rest of the plugin:
llm -f github:simonw/llm-video-frames ‘
import fitz
doc = fitz.open("input.pdf")
for page in doc:
pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)
‘ -s ‘output llm_pdf_to_images.py which adds a pdf-to-images:
fragment loader that converts a PDF to frames using fitz like in the example’ \
-m o4-mini
Here’s the transcript – more details in this issue.
I had some weird results testing this with GPT 4.1 mini. I created a test PDF with two pages – one white, one black – and ran a test prompt like this:
llm -f ‘pdf-to-images:blank-pages.pdf’ \
‘describe these images’

The first image features a stylized red maple leaf with triangular facets, giving it a geometric appearance. The maple leaf is a well-known symbol associated with Canada.
The second image is a simple black silhouette of a cat sitting and facing to the left. The cat’s tail curls around its body. The design is minimalistic and iconic.

I got even wilder hallucinations for other prompts, like "summarize this document" or "describe all figures". I have a collection of those in this Gist.
Thankfully this behavior is limited to GPT-4.1 mini. I upgraded to full GPT-4.1 and got much more sensible results:
llm -f ‘pdf-to-images:tests/blank-pages.pdf’ \
‘describe these images’ -m gpt-4.1

Certainly! Here are the descriptions of the two images you provided:

First image: This image is completely white. It appears blank, with no discernible objects, text, or features.

Second image: This image is entirely black. Like the first, it is blank and contains no visible objects, text, or distinct elements.

If you have questions or need a specific kind of analysis or modification, please let me know!

Tags: llm, plugins, ai, llms, ai-assisted-programming, pdf, generative-ai, projects

AI Summary and Description: Yes

Summary: The text discusses a plugin that converts PDF documents into images page-by-page to facilitate the use of image inputs in language models (LLMs). This innovation is significant for AI practitioners as it enhances the functionality of LLMs in processing document formats that aren’t natively supported.

Detailed Description: The provided content describes the creation and functionality of a plugin named `llm-pdf-to-images`. This plugin serves as a bridge between PDF documents and LLMs, enabling users to convert PDF pages into images and subsequently analyze those images using language models. Here are the key points:

– **Plugin Functionality**:
– Designed to transform PDF documents into images (one per page), making it easier for LLMs that accept image inputs but not PDFs.
– Implemented with the PyMuPDF library for PDF handling and conversion.

– **Installation and Use**:
– Users can install the plugin with `llm install llm-pdf-to-images`.
– The command `llm -f pdf-to-images:path/to/document.pdf ‘Summarize this document’` showcases how to use the plugin to analyze the contents of a PDF.

– **Core Code Overview**:
– The essential code snippet provided demonstrates how to open a PDF, convert pages to images, and save them as JPEGs.
– This functionality allows LLMs to process visual inputs derived from textual documents.

– **Testing and Results**:
– Initial tests yielded “hallucinatory” results when using the LLM with certain prompts, particularly on a test PDF containing blank pages, which led to unexpected interpretations.
– However, using the full version of GPT-4.1 provided more coherent responses, illustrating the importance of model selection in achieving reliable outputs.

– **Practical Implications**:
– This innovation emphasizes the significance of combining various types of input (text, images) in the AI workflows.
– It reflects an ongoing trend in the AI community towards enhancing interoperability among different data formats, which is crucial for improving machine learning model performance and versatility.

In summary, this plugin is a noteworthy contribution for professionals working with AI and LLMs, signaling progressive improvements in how AI systems can handle diverse content types for analysis and generation tasks.