Simon Willison’s Weblog: too many model context protocol servers and LLM allocations on the dance floor

Source URL: https://simonwillison.net/2025/Aug/22/too-many-mcps/#atom-everything
Source: Simon Willison’s Weblog
Title: too many model context protocol servers and LLM allocations on the dance floor

Feedly Summary: too many model context protocol servers and LLM allocations on the dance floor
Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP.
Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around 176,000 tokens – Claude 4’s 200,000 minus around 24,000 for the system prompt for those tools.
Adding just the popular GitHub MCP defines 93 additional tools and swallows another 55,000 of those valuable tokens!
MCP enthusiasts will frequently add several more, leaving precious few tokens available for solving the actual task… and LLMs are known to perform worse the more irrelevant information has been stuffed into their prompts.
Thankfully, there is a much more token-efficient way of Interacting with many of these services: existing CLI tools.
If your coding agent can run terminal commands and you give it access to GitHub’s gh tool it gains all of that functionality for a token cost close to zero – because every frontier LLM knows how to use that tool already.
I’ve had good experiences building small custom CLI tools specifically for Claude Code and Codex CLI to use. You can even tell them to run –help to learn how the tool, which works particularly well if your help text includes usage examples.
Tags: github, ai, prompt-engineering, generative-ai, llms, model-context-protocol, coding-agents, claude-code, geoffrey-huntley

AI Summary and Description: Yes

Summary: The text discusses the inefficiencies associated with using multiple model context protocol (MCP) servers and the significant token costs when utilizing large language models (LLMs). It highlights a more efficient alternative: using command-line interface (CLI) tools which require significantly fewer tokens. This insight into LLM token management is particularly relevant for AI professionals focused on optimizing performance and cost.

Detailed Description:
The content provides key insights into the optimization of using large language models (LLMs), specifically concerning the context token costs associated with these models when multiple Model Context Protocol (MCP) servers are in play.

– **Token Cost Concerns**: Geoffrey Huntley emphasizes the hidden expenses related to the token usage within LLMs. For instance, a context window for tools like Amp or Cursor can hold up to 176,000 tokens, while Claude 4 can handle 200,000 tokens, minus the token cost of system prompts.
– **Impact of Additional Tools**: The integration of popular MCP tools can lead to an increased token burden, where adding tools can consume significant portions of available tokens (e.g., 55,000 tokens for GitHub MCP), which can detract from the core task. This obfuscation can negatively impact LLM performance since too much irrelevant information results in poorer outputs.
– **Alternative Approaches**: Huntley proposes a more effective strategy through the use of command-line interface (CLI) tools. By allowing coding agents to run terminal commands, users can achieve similar functionality at minimal token cost, effectively making interactions token-efficient.
– **Custom CLI Tools**: He mentions positive experiences with building custom CLI tools designed especially for models like Claude Code and Codex CLI, enhancing their efficiency. This customizability allows users to utilize the ‘help’ command to gain insights into usage, catalyzing better interactions with the tools.

In conclusion, this discussion is vital for AI professionals, as it prompts re-evaluation of the toolsets in use, urging them to adopt more efficient methodologies that could save token usage while maximizing output quality from LLMs.