Simon Willison’s Weblog: llm-llama-server 0.2

Source URL: https://simonwillison.net/2025/May/28/llama-server-tools/
Source: Simon Willison’s Weblog
Title: llm-llama-server 0.2

Feedly Summary: llm-llama-server 0.2
Here’s a second option for using LLM’s new tool support against local models (the first was via llm-ollama).
It turns out the llama.cpp ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server plugin only needed a quick upgrade to get those working there.
Unfortunately it looks like streaming support doesn’t work with tools in llama-server at the moment, so I added a new model ID called llama-server-tools which disables streaming and enables tools.
Here’s how to try it out. First, ensure you have llama-server – the easiest way to get that on macOS is via Homebrew:
brew install llama.cpp

Start the server running like this. This command will download and cache the 3.2GB unsloth/gemma-3-4b-it-GGUF:Q4_K_XL if you don’t yet have it:
llama-server –jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL

Then in another window:
llm install llm-llama-server
llm -m llama-server-tools -T llm_time ‘what time is it?’ –td

And since you don’t even need an API key for this, even if you’ve never used LLM before you can try it out with this uvx one-liner:
uvx –with llm-llama-server llm -m llama-server-tools -T llm_time ‘what time is it?’ –td

For more notes on using llama.cpp with LLM see Trying out llama.cpp’s new vision support from a couple of weeks ago.
Tags: generative-ai, llm, plugins, projects, llm-tool-use, llama-cpp, ai, uv

AI Summary and Description: Yes

Summary: The provided text discusses the usage and enhancements of the llm-llama-server plugin, which allows local models to utilize OpenAI-compatible tool support. This is particularly relevant for professionals working with LLM (Large Language Model) technologies and those aiming to integrate such tools into their workflows without needing an API key.

Detailed Description: The text outlines improvements made to the llm-llama-server plugin, which is part of the llama.cpp ecosystem known for its compatibility with OpenAI tools. Here are the major points conveyed in the text:

– **Tool Support Enhancement**: The llm-llama-server plugin has been upgraded to leverage the robust OpenAI-compatible tool support already present in the llama.cpp ecosystem.
– **Streaming Limitations**: Current limitations of the system are highlighted as streaming support does not work with these tools in the llama-server setup.
– **Installation Instructions**:
– Users can easily install the llama-server on macOS using Homebrew with the command: `brew install llama.cpp`.
– Instructions are provided for starting the llama-server and downloading the necessary model.
– The specific command to run the server includes caching a model: `llama-server –jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL`.
– **Usage Ease**: Users can try out the functionalities without needing an API key, making it accessible for beginners.
– Example command for using the llm-llama-server without prior LLM experience is provided, demonstrating applicability: `uvx –with llm-llama-server llm -m llama-server-tools -T llm_time ‘what time is it?’ –td`.
– **Related Notes**: Reference to prior discussions on utilizing llama.cpp with new vision support, suggesting ongoing developments in the ecosystem.

This content is significant for security and compliance professionals working in AI and LLM environments, as it showcases the integration of security practices (like the absence of resource-intensive API keys) in enhancing local model usability and tool integration. The ability to run complex models locally without heavy cloud dependency may also have implications for data privacy and sovereignty.