Source URL: https://simonwillison.net/2025/Apr/7/long-context-llm/#atom-everything
Source: Simon Willison’s Weblog
Title: Long context support in LLM 0.24 using fragments and template plugins
Feedly Summary: LLM 0.24 is now available with new features to help take advantage of the increasingly long input context supported by modern LLMs.
(LLM is my command-line tool and Python library for interacting with LLMs, supported by 20+ plugins adding support for both local and remote models from a bunch of different providers.)
Trying it out
Improving LLM’s support for long context models
Asking questions of LLM’s documentation
Publishing, sharing and reusing templates
Everything else in LLM 0.24
Trying it out
To install LLM with uv (there are several other options):
uv tool install llm
You’ll need to either provide an OpenAI API key or install a plugin to use local models or models from other providers:
llm keys set openai
# Paste OpenAI API key here
To upgrade LLM from a previous version:
llm install -U llm
The biggest new feature is fragments. You can now use -f filename or -f url to add one or more fragments to your prompt, which means you can do things like this:
llm -f https://simonwillison.net/2025/Apr/5/llama-4-notes/ ‘bullet point summary’
Here’s the output from that prompt, exported using llm logs -c –expand –usage. Token cost was 5,372 input, 374 output which works out as 0.103 cents (around 1/10th of a cent) using the default GPT-4o mini model.
Plugins can implement custom fragment loaders with a prefix. The llm-fragments-github plugin adds a github: prefix that can be used to load every text file in a GitHub repository as a list of fragments:
llm install llm-fragments-github
llm -f github:simonw/s3-credentials ‘Suggest new features for this tool’
Here’s the output. That took 49,856 input tokens for a total cost of 0.7843 cents – nearly a whole cent!
Improving LLM’s support for long context models
Long context is one of the most exciting trends in LLMs over the past eighteen months. Saturday’s Llama 4 Scout release gave us the first model with a full 10 million token context. Google’s Gemini family has several 1-2 million token models, and the baseline for recent models from both OpenAI and Anthropic is 100 or 200 thousand.
Two years ago most models capped out at 8,000 tokens of input. Long context opens up many new interesting ways to apply this class of technology.
I’ve been using long context models via my files-to-prompt tool to summarize large codebases, explain how they work and even to debug gnarly bugs. As demonstrated above, it’s surprisingly inexpensive to drop tens of thousands of tokens into models like GPT-4o mini or most of the Google Gemini series, and the results are often very impressive.
One of LLM’s most useful features is that it logs every prompt and response to a SQLite database. This is great for comparing the same prompt against different models and tracking experiments over time – my own database contained thousands of responses from hundreds of different models accumulated over the past couple of years.
This is where long context prompts were starting to be a problem. Since LLM stores the full prompt and response in the database, asking five questions of the same source code could result in five duplicate copies of that text in the database!
The new fragments feature targets this problem head on. Each fragment is stored once in a fragments table, then de-duplicated in the future using a SHA256 hash of its content.
This saves on storage, and also enables features like llm logs -f X for seeing all logged responses that use a particular fragment.
Fragments can be specified in several different ways:
a path to a file
a URL to data online
an alias that’s been set against a previous fragment (see llm fragments set)
a hash ID of the content of a fragment
using prefix:argument to specify fragments from a plugin
Asking questions of LLM’s documentation
Wouldn’t it be neat if LLM could anser questions about its own documentation?
The new llm-docs plugin (built with the new register_fragment_loaders() plugin hook) enables exactly that:
llm install llm-docs
llm -f docs: “How do I embed a binary file?"
The output starts like this:
To embed a binary file using the LLM command-line interface, you can use the llm embed command with the –binary option. Here’s how you can do it:
Make sure you have the appropriate embedding model installed that supports binary input.
Use the following command syntax:
llm embed -m
Replace <model_id> with the identifier for the embedding model you want to use (e.g., clip for the CLIP model) and <path_to_your_binary_file> with the path to your actual binary file.
(74,570 input, 240 output = 1.1329 cents with GPT-4o mini)
Using -f docs: with just the prefix is the same as using -f docs:llm. The plugin fetches the documentation for your current version of LLM from my new simonw/docs-for-llms repo, which also provides packaged documentation files for my datasette, s3-credentials, shot-scraper and sqlite-utils projects.
Datasette’s documentation has got pretty long, so you might need to run that through a Gemini model instead (using the llm-gemini plugin):
llm -f docs:datasette -m gemini-2.0-flash \
‘Build a render_cell plugin that detects and renders markdown’
Here’s the output. 132,042 input, 1,129 output with Gemini 2.0 Flash = 1.3656 cents.
You can browse the combined documentation files this uses in docs-for-llm. They’re built using GitHub Actions.
llms-txt is a project lead by Jeremy Howard that encourages projects to publish similar files to help LLMs ingest a succinct copy of their documentation.
Publishing, sharing and reusing templates
The new register_template_loaders() plugin hook allows plugins to register prefix:value custom template loaders, for use with the llm -t option.
llm-templates-github and llm-templates-fabric are two new plugins that make use of that hook.
llm-templates-github lets you share and use templates via a public GitHub repository. Here’s how to run my Pelican riding a bicycle benchmark against a specific model:
llm install llm-templates-github
llm -t gh:simonw/pelican-svg -m o3-mini
This executes this pelican-svg.yaml template stored in my simonw/llm-templates repository, using a new repository naming convention.
llm -t gh:simonw/pelican-svg will load that pelican-svg.yaml file from the simonw/llm-templates repo. You can also use llm -t gh:simonw/name-of-repo/name-of-template to load a template from a repository that doesn’t follow that convention.
To share your own templates, create a repository on GitHub under your user account called llm-templates and start saving .yaml files to it.
llm-templates-fabric provides a similar mechanism for loading templates from Daniel Miessler’s extensive fabric collection:
llm install llm-templates-fabric
curl https://simonwillison.net/2025/Apr/6/only-miffy/ | \
llm -t f:extract_main_idea
A conversation with Daniel was the inspiration for this new plugin hook.
Everything else in LLM 0.24
LLM 0.24 is a big release, spanning 51 commits. The release notes cover everything that’s new in full – here are a few of my highlights:
The new llm-openai plugin provides support for o1-pro (which is not supported by the OpenAI mechanism used by LLM core). Future OpenAI features will migrate to this plugin instead of LLM core itself.
The problem with OpenAI models being handled by LLM core is that I have to release a whole new version of LLM every time OpenAI releases a new model or feature. Migrating this stuff out to a plugin means I can release new version of that plugin independently of LLM itself – something I frequently do for llm-anthropic and llm-gemini and others.
The new llm-openai plugin uses their Responses API, a new shape of API which I covered last month.
llm -t $URL option can now take a URL to a YAML template. #856
The new custom template loaders are fun, but being able to paste in a URL to a YAML file somewhere provides a simpler way to share templates.
Templates can now store default model options. #845
Attachments can now be stored in templates. #826
The quickest way to create your own template is with the llm prompt … –save name-of-template command. This now works with attachments, fragments and default model options, each of which is persisted in the template YAML file.
New llm models options family of commands for setting default options for particular models. #829
I built this when I learned that Qwen’s QwQ-32b model works best with temperature 0.7 and top p 0.95.
llm prompt -d path-to-sqlite.db option can now be used to write logs to a custom SQLite database. #858
This proved extremely useful for testing fragments – it meant I could run a prompt and save the full response to a separate SQLite database which I could then upload to S3 and share as a link to Datasette Lite.
llm similar -p/–plain option providing more human-readable output than the default JSON. #853
I’d like this to be the default output, but I’m holding off on changing that until LLM 1.0 since it’s a breaking change for people building automations against the JSON from llm similar.
Set the LLM_RAISE_ERRORS=1 environment variable to raise errors during prompts rather than suppressing them, which means you can run python -i -m llm ‘prompt’ and then drop into a debugger on errors with import pdb; pdb.pm(). #817
Really useful for debugging new model plugins.
llm prompt -q gpt -q 4o option – pass -q searchterm one or more times to execute a prompt against the first model that matches all of those strings – useful for if you can’t remember the full model ID. #841
Pretty obscure but I found myself needing this. Vendors love releasing models with names like gemini-2.5-pro-exp-03-25, now I can run llm -q gem -q 2.5 -q exp ‘say hi’ to save me from looking up the model ID.
OpenAI compatible models configured using extra-openai-models.yaml now support supports_schema: true, vision: true and audio: true options. Thanks @adaitche and @giuli007. #819, #843
I don’t use this feature myself but it’s clearly popular, this isn’t the first time I’e had PRs with improvements from the wider community.
Tags: plugins, projects, ai, annotated-release-notes, openai, generative-ai, llms, llm, gemini, long-context, files-to-prompt
AI Summary and Description: Yes
Summary: The text discusses the release of LLM 0.24, which introduces new features, especially for long context models in generative AI applications. It highlights improvements in functionalities related to managing input fragments, plugins, and template sharing, which have significant implications for developers working in AI and cloud environments, particularly those utilizing large language models (LLMs).
Detailed Description:
The release of LLM 0.24 comes with various enhancements that are noteworthy for professionals in AI, generative AI security, and software development. The following are key points from the new update:
– **Support for Long Context Models**:
– The addition of long context capabilities, which allow processing of inputs totaling millions of tokens.
– Notable models (such as Llama 4 Scout, Google’s Gemini) now support extensive token contexts, enhancing utility in complex applications.
– Long context enables innovative usages, like summarizing large codebases and debugging.
– **Fragment Management Improvement**:
– Introduction of a new fragments feature to prevent duplicate entries in the SQLite database.
– Fragments are stored only once, improving storage efficiency and simplifying the logging process.
– Users can specify fragments in multiple ways, including file paths, URLs, or hashes.
– **Documentation Interaction**:
– A novel llm-docs plugin allows users to query the tool about its own documentation directly.
– Enhanced ease of use for developers seeking to embed or utilize specific functionalities of LLM.
– **Template Sharing and Reusability**:
– New hooks for template loaders, enabling easier sharing of templates from GitHub.
– Encourages collaboration and code reuse across projects, promoting efficiency in software development.
– **Plugin Ecosystem Expansion**:
– Transitioning OpenAI model management to a plugin-based approach allows for smoother updates and new feature releases independent of the core LLM.
– Added flexibility ensures compatibility with ongoing advancements in AI models.
– **Performance and Logging Enhancements**:
– Detailed logging functionality aids in tracking usage patterns and optimizing applications.
– Users can perform analysis on prompts and responses across various models to refine their applications’ performance.
– **Additional Minor Updates**:
– Various new commands for enhanced operations and debugging capabilities.
– New model configuration options that enhance compatibility and performance with specific AI applications.
The release of LLM 0.24 serves as a significant milestone in evolving the capabilities of AI tools, particularly for those integrating generative AI within cloud and infrastructure environments. The emphasis on long context input and improved efficiency through fragment management are particularly beneficial for developers focusing on more sophisticated language processing tasks.