Simon Willison’s Weblog: Mistral OCR

Mar 7, 2025

—

Source URL: https://simonwillison.net/2025/Mar/7/mistral-ocr/#atom-everything
Source: Simon Willison’s Weblog
Title: Mistral OCR

Feedly Summary: Mistral OCR
New closed-source specialist OCR model by Mistral – you can feed it images or a PDF and it produces Markdown with optional embedded images.
It’s available via their API, or it’s “available to self-host on a selective basis" for people with stringent privacy requirements who are willing to talk to their sales team.
I decided to try out their API, so I copied and pasted example code from their notebook into my custom Claude project and told it:

Turn this into a CLI app, depends on mistralai – it should take a file path and an optional API key defauling to env vironment called MISTRAL_API_KEY

After some further iteration / vibe coding I got to something that worked, which I then tidied up and shared as mistral_ocr.py.
You can try it out like this:
export MISTRAL_API_KEY=’…’
uv run http://tools.simonwillison.net/python/mistral_ocr.py \
mixtral.pdf –html –inline-images > mixtral.html

I fed in the Mixtral paper as a PDF. The API returns Markdown, but my –html option renders that Markdown as HTML and the –inline-images option takes any images and inlines them as base64 URIs (inspired by monolith). The result is mixtral.html, a 972KB HTML file with images and text bundled together.
This did a pretty great job!

My script renders Markdown tables but I haven’t figured out how to render inline Markdown MathML yet. I ran the command a second time and requested Markdown output (the default) like this:
uv run http://tools.simonwillison.net/python/mistral_ocr.py \
mixtral.pdf > mixtral.md

Here’s that Markdown rendered as a Gist – there are a few MathML glitches so clearly the Mistral OCR MathML dialect and the GitHub Formatted Markdown dialect don’t quite line up.
My tool can also output raw JSON as an alternative to Markdown or HTML – full details in the documentation.
The big question with LLM-based OCR is always how well it copes with accidental instructions in the text (can you safely OCR a document full of prompting examples?) and how well it handles text it can’t write.
Mistral’s Sophia Yang says it "should be robust" against following instructions in the text, and invited people to try and find counter-examples.
Alexander Doria noted that Mistral OCR can hallucinate text when faced with handwriting that it cannot understand.
Via @sophiamyang
Tags: vision-llms, mistral, pdf, generative-ai, ocr, ai, llms

AI Summary and Description: Yes

Summary: The text discusses Mistral’s newly developed OCR model, emphasizing its capabilities, privacy considerations, and various technical implementations. This new tool is significant for AI, particularly in generative AI and OCR, as it addresses potential challenges and use cases involving privacy and messy textual data.

Detailed Description: The text provides an overview of Mistral’s closed-source OCR model, highlighting several important points that could be of interest to professionals in AI security and compliance. Here are the major points covered:

– **Functionality**: Mistral OCR is designed to process images or PDFs and convert them into Markdown, with the option to include embedded images. This functionality is crucial for automating document processing and improving accessibility.

– **Privacy Features**: The OCR system can be accessed via an API or can be self-hosted upon request. This selective availability is aimed at users with stringent privacy requirements, indicating Mistral’s attentiveness to data security and compliance considerations.

– **API Usage and Implementation**:
– Users can integrate the OCR functionality into their custom projects. The text describes the author’s process of using the Mistral API in a command-line interface (CLI) application.
– The scripts developed can output different formats, such as Markdown, HTML, or raw JSON, catering to varied user needs.

– **Quality and Limitations**:
– The text discusses quality control, particularly in how the OCR handles accidental instructions within text inputs and its performance with challenging formats, such as handwriting or MathML.
– Mistral’s representative claims that the software is designed to be robust against misinterpretations, although there are noted limitations when dealing with handwriting and certain formatting quirks.

– **Community Involvement**: By inviting users to challenge the model, Mistral encourages community interaction and feedback, which is essential for improving AI models in practice.

– **Tags and Relevance**: The text includes tags that connect it to broader themes in AI and OCR technology, indicating its relevance to the generative AI and LLM (Large Language Model) security landscapes.

Key Insights for Security and Compliance Professionals:
– As AI tools like Mistral OCR become more integrated into workflows, ensuring compliance with data privacy regulations becomes essential, especially when handling sensitive documents.
– The potential for AI models to misinterpret or hallucinate content raises concerns over reliance on automated systems for critical tasks.
– This technology’s accessibility via both API and self-hosting options caters to different organizational needs, presenting varying levels of data security and governance opportunities.

Overall, Mistral’s OCR model represents a significant development in AI-driven document processing, with implications for security, privacy, and compliance standards in technology use.

.NET 2 4 5 7 a access accessibility Act after AI ai model AI models AI security AI tool AI tools Alexa alt and API Application art as Auto Automated Systems availability base64 based by C capabilities CERN challenges CIA Claude CleaR closed code coding command command-line interface community community involvement compliance compliance considerations compliance professionals compliance standards concerns content control critical critical tasks D data data privacy data privacy regulations data security de design development document document processing documentation driven e end ERP exp export face fault feature features feedback for full functionality g Gen generative Generative AI GIS git GitHub Go governance gs H high Highlight hosted hosting http HTTPS IAM image implementation implications in insights inter interaction interface interpret ite J job json k Key l land language language model large large language model led Li limitations llm llms lm low man markdown math matt Mistral ML Mode model models my N native no notebook NPU o OCR of on OPM opt organization out over pdf performance phi point potential pre privacy privacy considerations privacy features privacy regulations process processing professionals project projects prompt Prompting Py Python quality quality control question R rag rate RCE red Regulation regulations Requirements return Ro s safe sales sec security security and compliance security landscape self self-hosting SHA side Sig Sim software source SSE standards system systems T Tags: Tails Task tasks tech technical implementation technology technology use text the Time to tool tools TP UI up US usage use use cases user user needs Users uth uv V vibe coding Vision vision-llms web Well Wi workflow workflows x