Simon Willison’s Weblog: o3-mini is really good at writing internal documentation

Source URL: https://simonwillison.net/2025/Feb/5/o3-mini-documentation/#atom-everything
Source: Simon Willison’s Weblog
Title: o3-mini is really good at writing internal documentation

Feedly Summary: o3-mini is really good at writing internal documentation
I wanted to refresh my knowledge of how the Datasette permissions system works today. I already have extensive hand-written documentation for that, but I thought it would be interesting to see if I could derive any insights from running an LLM against the codebase.
o3-mini has an input limit of 200,000 tokens. I used LLM and my files-to-prompt tool to generate the documentation like this:
cd /tmp
git clone https://github.com/simonw/datasette
cd datasette
files-to-prompt datasette -e py -c | \
llm -m o3-mini -s \
‘write extensive documentation for how the permissions system works, as markdown’
The files-to-prompt command is fed the datasette subdirectory, which contains just the source code for the application – omitting tests (in tests/) and documentation (in docs/).
The -e py option causes it to only include files with a .py extension – skipping all of the HTML and JavaScript files in that hierarchy.
The -c option causes it to output Claude’s XML-ish format – a format that works great with other LLMs too.
You can see the output of that command in this Gist.
Then I pipe that result into LLM, requesting the o3-mini OpenAI model and passing the following system prompt:

write extensive documentation for how the permissions system works, as markdown

Specifically requesting Markdown is important.
The prompt used 99,348 input tokens and produced 3,118 output tokens (320 of those were invisible reasoning tokens). That’s a cost of 12.3 cents.
Honestly, the results are fantastic. I had to double-check that I hadn’t accidentally fed in the documentation by mistake.
(It’s possible that the model is picking up additional information about Datasette in its training set, but I’ve seen similar high quality results from other, newer libraries so I don’t think that’s a significant factor.)
In this case I already had extensive written documentation of my own, but this was still a useful refresher to help confirm that the code matched my mental model of how everything works.
Documentation of project internals as a category is notorious for going out of date. Having tricks like this to derive usable how-it-works documentation from existing codebases in just a few seconds and at a cost of a few cents is wildly valuable.
Tags: llm, openai, o3, ai, llms, datasette, generative-ai, documentation, ai-assisted-programming

AI Summary and Description: Yes

Summary: The text discusses the use of an LLM (o3-mini) to generate documentation for the Datasette permissions system, highlighting the efficiency and cost-effectiveness of leveraging AI for this purpose. It is particularly relevant to professionals interested in generative AI applications in software development and documentation practices.

Detailed Description: The text illustrates a practical application of generative AI in creating internal project documentation, specifically using an LLM model (o3-mini). Here are the key points:

– **Use of LLM for Documentation**:
– The author aimed to refresh their knowledge of the Datasette permissions system and decided to utilize an LLM to auto-generate documentation.
– Extensive hand-written documentation already existed but the author sought additional insights through generative AI.

– **Technical Implementation**:
– The process began with cloning the Datasette repository and limiting the scope to Python files for the documentation generation by using commands like `files-to-prompt`.
– The author specified parameters within their command to ensure the proper file types and formats were selected, ultimately creating a workflow that involved:
– Cloning the repository: `git clone https://github.com/simonw/datasette`
– Selecting only relevant code files (`-e py`) while excluding tests and existing documentation.
– Utilizing a specific output format optimized for LLM processing.

– **Output and Cost Considerations**:
– The command generated 3,118 output tokens from a prompt using 99,348 input tokens, resulting in a modest operational cost of 12.3 cents.
– The quality of the generated documentation was deemed “fantastic” by the author, reinforcing the notion that the LLM effectively understood the code structure and context.

– **Value Proposition**:
– The ability to quickly derive documentation from existing codebases in a cost-efficient manner is highlighted as a significant advantage for development teams who often struggle with keeping documentation current.
– The narrative suggests that generative AI can serve as a valuable tool for code comprehension and help maintain alignment between documentation and the actual functionality of software projects.

This case exemplifies how generative AI can augment traditional software development processes, ensuring that documentation is both accurate and up-to-date, a critical aspect of successful project management in security, compliance, and infrastructure. Such applications may encourage professionals to explore more integrations of AI tools for streamlined workflows.