Simon Willison’s Weblog: too many model context protocol servers and LLM allocations on the dance floor

Aug 22, 2025

—

Source URL: https://simonwillison.net/2025/Aug/22/too-many-mcps/#atom-everything
Source: Simon Willison’s Weblog
Title: too many model context protocol servers and LLM allocations on the dance floor

Feedly Summary: too many model context protocol servers and LLM allocations on the dance floor
Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP.
Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around 176,000 tokens – Claude 4’s 200,000 minus around 24,000 for the system prompt for those tools.
Adding just the popular GitHub MCP defines 93 additional tools and swallows another 55,000 of those valuable tokens!
MCP enthusiasts will frequently add several more, leaving precious few tokens available for solving the actual task… and LLMs are known to perform worse the more irrelevant information has been stuffed into their prompts.
Thankfully, there is a much more token-efficient way of Interacting with many of these services: existing CLI tools.
If your coding agent can run terminal commands and you give it access to GitHub’s gh tool it gains all of that functionality for a token cost close to zero – because every frontier LLM knows how to use that tool already.
I’ve had good experiences building small custom CLI tools specifically for Claude Code and Codex CLI to use. You can even tell them to run –help to learn how the tool, which works particularly well if your help text includes usage examples.
Tags: github, ai, prompt-engineering, generative-ai, llms, model-context-protocol, coding-agents, claude-code, geoffrey-huntley

AI Summary and Description: Yes

Summary: The text discusses the inefficiencies associated with using multiple model context protocol (MCP) servers and the significant token costs when utilizing large language models (LLMs). It highlights a more efficient alternative: using command-line interface (CLI) tools which require significantly fewer tokens. This insight into LLM token management is particularly relevant for AI professionals focused on optimizing performance and cost.

Detailed Description:
The content provides key insights into the optimization of using large language models (LLMs), specifically concerning the context token costs associated with these models when multiple Model Context Protocol (MCP) servers are in play.

– **Token Cost Concerns**: Geoffrey Huntley emphasizes the hidden expenses related to the token usage within LLMs. For instance, a context window for tools like Amp or Cursor can hold up to 176,000 tokens, while Claude 4 can handle 200,000 tokens, minus the token cost of system prompts.
– **Impact of Additional Tools**: The integration of popular MCP tools can lead to an increased token burden, where adding tools can consume significant portions of available tokens (e.g., 55,000 tokens for GitHub MCP), which can detract from the core task. This obfuscation can negatively impact LLM performance since too much irrelevant information results in poorer outputs.
– **Alternative Approaches**: Huntley proposes a more effective strategy through the use of command-line interface (CLI) tools. By allowing coding agents to run terminal commands, users can achieve similar functionality at minimal token cost, effectively making interactions token-efficient.
– **Custom CLI Tools**: He mentions positive experiences with building custom CLI tools designed especially for models like Claude Code and Codex CLI, enhancing their efficiency. This customizability allows users to utilize the ‘help’ command to gain insights into usage, catalyzing better interactions with the tools.

In conclusion, this discussion is vital for AI professionals, as it prompts re-evaluation of the toolsets in use, urging them to adopt more efficient methodologies that could save token usage while maximizing output quality from LLMs.

.NET 1 2 2025 24 3 4 5 7 a access Act actions age agent agents AI All alt alternative approaches and app art as at ated Bi building by C CERN CI CIA Claude Claude 4 Claude Code co code codex coding coding agent coding agents Col command command-line interface concerns content Context context window core cost Costs Cursor custom customizability D de DeFi design e effective efficiency efficient Engineer engineering evaluation exp experience face Fed fine fines focused for front full function functionality g Gen generative geo git GitHub Go gs H high Highlight HR http HTTPS impact in inefficiencies information insights Instance integration inter interaction interactions interface io irrelevant information J Just k Key l language language model language models large large language model large language models Large Language Models (LLMs) led Li line line interface llm llms lm low M making man management max mcp methodologies Mila mini Mode model model context protocol models multi N native no o obfuscation of off on ons opt optimization oS other out output Outputs per performance play pre pro professionals prompt prompt-engineering prompts protocol ps Q quality R rate RCE re ready Ro RoT s Sable server servers service services Sig Sim Simon Willison size sizes small SoC solving source specific SSE SSO Strategy system system prompt system prompts T Tags: Task ted terminal text the to token token management token usage tokens tool tools Toolset TP UI up US usage use user Users V val Valuation web Well Wi Wind x yt z zero