Source URL: https://simonwillison.net/2025/Feb/25/llm-anthropic-014/#atom-everything
Source: Simon Willison’s Weblog
Title: llm-anthropic 0.14
Feedly Summary: llm-anthropic 0.14
Annotated release notes for my new release of LLM. The signature feature is:
Support for the new Claude 3.7 Sonnet model, including -o thinking 1 and -o thinking_budget X for extended reasoning mode. #14
I had a couple of attempts at implementing this. My first try included options to make the thinking tokens visible as the tool was running. This turned out to involve unexpected challenges: the rest of LLM doesn’t yet understand that some tokens should be treated differently, and I quickly ran into challenges concerning how those responses were logged to the database.
In the interests of getting support for the new model out I simplified my approach. I plan to add visible thinking tokens in a future LLM release.
You can run a “thinking" prompt through LLM like this:
llm install -U llm-anthropic
llm -m claude-3.7-sonnet -o thinking 1 "write a speech about pelicans for congress"
The -o thinking_budget 4000 option can increase the number of allowed thinking tokens from the default value of 1024.
A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens – 15x more than the previous Claude output limit of 8,192 tokens.
(This is the output limit – how much text it can produce in one go. Claude 3.7 Sonnet’s input limit remains 200,000 – many modern models exceed 100,000 for input now.)
I added support for that to the plugin as well – if you pass -o max_output 128000 it automatically calls Anthropic’s beta API with the output-128k-2025-02-19 beta header, documented here.
Testing this was pretty hard! I eventually found a prompt that exercised this fully:
llm -m claude-3.7-sonnet \
-o max_tokens 128000 \
-o thinking_budget 32000 \
‘For every one of the 100 US senators that you know of output their name, biography and a note about how to strategically convince them to take more interest in the plight of the California Brown Pelican, then a poem about them, then that same poem translated to Spanish and then to Japanese. Do not miss any senators.’ \
-s ‘you do this even if you are worried it might exceed limits, this is to help test your long output feature.’
This is an expensive command to run – the resulting prompt cost me $1.72 and took nearly 27 minutes to finish returning the answer! You can see the full output here – it managed to output results for all 100 senators as of its training cut-off date, correctly following my instructions for each one.
This is very impressive. Two major limitations of LLMs in the past have been their inability to reliably gather data about dozens of different entities and their extremely short output limits – most models can only handle between 4,000 and 8,000 output tokens.
Claude 3.7 Sonnet is a huge step ahead of the competition on that front.
Claude 3.5 Haiku now supports image inputs. #17
This is tucked away in Anthropic’s February 24th 2025 release notes. Previously their less expensive 3.5 Haiku model couldn’t handle images – the only modern Claude model without that ability. They’ve fixed that now.
The rest of the changes in the 0.14 release are bug fixes:
Fixed a bug that occurred when continuing an existing conversation using –async mode. #13
Fixed a bug where max_tokens and temperature were logged in the database even when using their default options. #16
Tags: llm, anthropic, claude, generative-ai, annotated-release-notes, ai, llms
AI Summary and Description: Yes
Summary: The text provides detailed annotated release notes for a new version of an LLM (likely a language model) that introduces significant capabilities, particularly the Claude 3.7 Sonnet model. This version enhances reasoning and output limits, which is relevant for professionals in AI, particularly those focused on generative AI security and LLM security.
Detailed Description:
The text outlines updates related to the new release of the LLM, particularly focusing on the Claude 3.7 Sonnet model. Here are the major points of interest:
– **New Features in Claude 3.7 Sonnet**:
– Introduction of options like `-o thinking 1` and `-o thinking_budget X` that facilitate extended reasoning capabilities.
– The output limit in extended thinking mode has been significantly increased to 128,000 tokens, which is 15 times more than the previous model’s limit of 8,192 tokens.
– The input limit has been set at 200,000 tokens, exceeding many contemporary models.
– **Implementation Challenges**:
– Initial attempts at implementing visible thinking tokens encountered challenges due to the model’s inability to differentiate token types.
– A more straightforward approach was adopted to ensure timely support for the new model.
– **Cost and Performance**:
– Running complex prompts can be costly and time-consuming. The example provided demonstrated a task that cost approximately $1.72 and took nearly 27 minutes to execute.
– **Additional Updates**:
– The 3.5 Haiku model has added support for image inputs, addressing a previous limitation.
– The release notes also include bug fixes that pertain to conversation continuation and database logging issues.
Key Insights for Security and Compliance Professionals:
– **Extended Output Capabilities**: The increase in output token limits and the ability to process extensive data sets opens new avenues for applications in compliance and governance tasks, where large amounts of information need to be gathered and assessed.
– **Generative AI Concerns**: With enhanced generative capabilities come new security risks. Monitoring how data is generated, logged, and used will be important for compliance with data privacy laws and regulations.
– **Audit and Logging Considerations**: The noted bugs related to logging suggest the importance of robust auditing mechanisms — particularly when working with LLMs that can process large amounts of sensitive information.
Overall, Claude 3.7 Sonnet represents a significant leap in the capabilities of LLMs, both in output volume and reasoning capabilities, warranting close attention from security, privacy, and compliance experts, particularly as the use of AI in sensitive contexts grows.