Tomasz Tunguz: Small Action Models Are the Future of AI Agents

Aug 4, 2025

—

Source URL: https://www.tomtunguz.com/local-instructions/
Source: Tomasz Tunguz
Title: Small Action Models Are the Future of AI Agents

Feedly Summary: 2025 is the year of agents, & the key capability of agents is calling tools.
When using Claude Code, I can tell the AI to sift through a newsletter, find all the links to startups, verify they exist in our CRM, with a single command. This might involve two or three different tools being called.
But here’s the problem: using a large foundation model for this is expensive, often rate-limited, & overpowered for a selection task.
What is the best way to build an agentic system with tool calling?
The answer lies in small action models. NVIDIA released a compelling paper arguing that “Small language models (SLMs) are sufficiently powerful, inherently more suitable, & necessarily more economical for many invocations in agentic systems.”
I’ve been testing different local models to validate a cost reduction exercise. I started with a Qwen3:30b parameter model, which works but can be quite slow because it’s such a big model, even though only 3 billion of those 30 billion parameters are active at any one time.
The NVIDIA paper recommends the Salesforce xLAM model – a different architecture called a large action model specifically designed for tool selection.
So, I ran a test of my own, each model calling a tool to list my Asana tasks.

Model
Success Rate
Avg Response Time
Avg Tool Time
Avg Total Time

xLAM
100% (25/25)
1.48s
1.14s
2.61s ± 0.47s

Qwen
92% (23/25)
8.75s
1.07s
9.82s ± 1.53s

The results were striking: xLAM completed tasks in 2.61 seconds with 100% success, while Qwen took 9.82 seconds with 92% success – nearly four times as long.

This experiment shows the speed gain, but there’s a trade-off: how much intelligence should live in the model versus in the tools themselves. This limited
With larger models like Qwen, tools can be simpler because the model has better error tolerance & can work around poorly designed interfaces. The model compensates for tool limitations through brute-force reasoning.
With smaller models, the model has less capacity to recover from mistakes, so the tools must be more robust & the selection logic more precise. This might seem like a limitation, but it’s actually a feature.
This constraint eliminates the compounding error rate of LLM chained tools. When large models make sequential tool calls, errors accumulate exponentially.
Small action models force better system design, keeping the best of LLMs and combining it with specialized models.
This architecture is more efficient, faster, & more predictable.

AI Summary and Description: Yes

Summary: The text discusses the future of agentic systems in AI, specifically the advantages of using small language models (SLMs) for tool calling tasks over larger models. With data from experimental tests comparing the performance of different models, it highlights efficiency gains and the necessitated complexity in system design for effective tool integration.

Detailed Description: The text explores the evolving landscape of AI agents, particularly focusing on the role of language models in optimizing performance when interacting with various tools. Key points include:

– **Agentic Systems and Tool Calling**: The text suggests that 2025 will be a pivotal year for AI agents that can effectively utilize various tools via simple commands.
– **Performance Comparison Between Models**: The discussion is centered around two specific models:
– **xLAM**: A Salesforce model specifically designed for tool selection.
– **Qwen3:30b**: A large foundation model with significant parameters but performance trade-offs.
– **Testing Results**:
– The xLAM model demonstrated a 100% success rate with an average completion time of 2.61 seconds for tasks.
– In contrast, the Qwen model had a 92% success rate with an average time of 9.82 seconds, indicating a substantial performance discrepancy.
– **Trade-offs Between Model Size and Tool Efficiency**:
– Larger models can work around poorly designed interfaces due to their extensive error tolerance, which can be a double-edged sword as it may lead to a reliance on the model’s brute-force reasoning.
– Conversely, smaller models necessitate more robust tool designs and precise selection logic; however, they can create a more efficient and predictable system overall.
– **Impact on System Design**: The result of using smaller action models is a push toward enhanced system architecture that avoids compounding errors associated with large language models (LLMs), promoting better design practices.

In summary, the mentioned advancements indicate that as AI continues to evolve, focusing on smaller, specialized models may lead to significant improvements in efficiency and reliability within AI-driven applications, fostering better integration of tools and overall system robustness. This insight is crucial for professionals in AI and system development, highlighting the importance of balancing model capabilities with the robustness of associated tools.

1 10 2 2025 3 4 5 53 7 a Act action models advancement advancements age agent agentic agentic systems agents AI and anti app Application applications Arch architecture art as asana at ated average being Best Bi brute C calling capabilities capability capacity centered chain CI CIA Claude Claude Code co code command complexity cost cost reduction CRM D data de demo design development Double drive driven driven applications e E 3 edge effective efficiency efficiency gains efficient election end ERP error error rate errors exp face fast faster feature for foundation model future future of AI g Gen H high Highlight HR http HTTPS in instruction integration Intel intelligence inter interface Interfaces io ite k keeping Key l land language language model language models large Large Action Model large language model large language models Large Language Models (LLMs) large models led Li liability limitations Link llm llms lm local local models logic long low M man mistakes Mode model model capabilities models my N new news no nomic NSA Nvidia o of off offs on one only ons OPM opt oS over paper parameter Paris per performance performance comparison performance trade point Power powered practices pre pro problem professionals ps Q Qwen R rag rate RCE re reasoning red reduction release reliability response Ro robustness Role s sales Salesforce sec Sig Sim Simple single size small small language models smaller models SoC source specialized specialized models specific specific models speed SSE SSO STAR start startup startups system system architecture system design system development systems T Task tasks ted test Testing text the Time times to tool tool calling tool integration tools TP trade two UI up ups US use V val Valid Vantage Wi x z