Tomasz Tunguz: Small Action Models Are the Future of AI Agents

Aug 4, 2025

—

Source URL: https://www.tomtunguz.com/ai-skills-inversion/
Source: Tomasz Tunguz
Title: Small Action Models Are the Future of AI Agents

Feedly Summary: 2025 is the year of agents, and the key capability of agents is calling tools.
When using Claude Code, I can tell the AI to sift through a newsletter, find all the links to startups, verify they exist in our CRM, with a single command. This might involve two or three different tools being called.
But here’s the problem: using a large foundation model for this is expensive, often rate-limited, and overpowered for a selection task. It’s like using a Ferrari to deliver pizza. The computational overhead makes no economic sense when the primary task is choosing the right tool, not performing complex reasoning.
This raises a fundamental architecture question: what’s a better way to build agent systems that can efficiently orchestrate multiple tools without breaking the bank?
The answer lies in small action models. NVIDIA released a compelling paper arguing that “Small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems.” This isn’t just theoretical—it’s playing out in practice across enterprise deployments.
I’ve been testing different local models to validate this approach. Started with a Q1 330 billion parameter model, which works but runs painfully slow. Then I shifted to Salesforce’s xLAM model, a large action model specifically designed for tool orchestration. The performance on my Mac M2 Pro is ideal—fast inference with excellent tool selection accuracy.
This experiment revealed a fundamental trade-off in agent architecture: how much intelligence should live in the model versus in the tools themselves. With larger models like QN, tools can be simpler because the model has better error tolerance and can work around poorly designed interfaces. The model compensates for tool limitations through brute-force reasoning.
With smaller models, you need better error correction in tool selection. The model has less capacity to recover from mistakes, so the tools must be more robust and the selection logic more precise. This might seem like a limitation, but it’s actually a feature.
This constraint eliminates the compounding error rate of LLM chained tools. When large models make sequential tool calls, errors accumulate exponentially. Each mistaken tool selection reduces the probability of overall task success. The mathematics of this compounding error show why even 95% accuracy per step leads to system failure over long chains.
Small action models force better system design. They require deterministic tool interfaces, clear error handling, and robust fallback mechanisms. The result is more reliable agent behavior, not less.

Consider the architectural implications across different deployment models:
[Space for 2×2 matrix comparing cloud vs on-prem and large vs small models]
Enterprise software companies are already seeing this pattern. Small action models running locally can handle 80% of tool orchestration tasks while large models stay reserved for complex reasoning that truly requires their capabilities. This hybrid approach delivers both cost efficiency and performance optimization.
The economic advantages become compelling at scale. Local inference costs essentially zero per call, while API costs accumulate quickly across thousands of agent interactions. For companies building agent-powered products, this cost difference determines product viability.
As 2025 becomes the year of agents, the winning architecture combines small action models for tool selection with deterministic, well-built tools for execution. This isn’t just a technical optimization—it’s the foundation for scalable enterprise AI systems that work reliably in production environments.

AI Summary and Description: Yes

Summary: The text discusses the evolving role of small language models (SLMs) in agent systems for efficient tool orchestration, emphasizing a cost-effective architecture that balances model intelligence and tool robustness. It highlights the practical advantages of using SLMs for enterprise software deployment as organizations gear up for the increased reliance on AI agents by 2025.

Detailed Description:
The focus of this text is on the challenges and architectural considerations in developing agent systems, particularly in how to choose and orchestrate different tools effectively. Here are the major points outlined in the text:

– **Current State of AI Agents**: The year 2025 is predicted to be pivotal for the use of AI agents which will need to efficiently call and utilize multiple tools within organizations.

– **Performance Concerns**: Using large foundation models like Claude Code for simple tasks is identified as economically inefficient, evoking the analogy of using a high-performance vehicle for mundane deliveries.

– **Architectural Optimization**: The text addresses the need to rethink how agent systems are architected to prevent unnecessary computational overhead. This leads to the introduction of small language models (SLMs).

– **Benefits of Small Language Models**:
– **Cost-Effectiveness**: SLMs are portrayed as inherently more economical and suitable for tasks that involve simpler tool calls rather than complex reasoning.
– **Practical Applications**: Companies are reportedly deploying these SLMs successfully in enterprise environments, validating their effectiveness.

– **Experimentation with Local Models**: The author shares personal experiences with various models, highlighting specific instances where Hadoop’s xLAM model provided better performance compared to a larger 330 billion parameter model.

– **Agent Architecture Trade-offs**:
– Larger models can manage simpler tools due to their computational capacity, but they can also lead to a greater accumulation of errors when chaining tool selections.
– A shift to smaller action models means that while the model must be more precise in tool selection, it encourages the development of more robust tools with clear error handling.

– **Implications for Tool Design**:
– Emphasis on deterministic interfaces and error management becomes crucial in environments utilizing smaller models for tool orchestration to reduce compounding errors.

– **Strategic Outlook**: As companies move toward hybrid architectures using both small action models for straightforward tasks and large models for more complex reasoning, they stand to gain significant economic and performance benefits.

– **Overall Vision for 2025**: The text concludes with insights into how the right mix of small action models and properly designed tools is foundational for the success and reliability of scalable enterprise AI systems.

In summary, the text presents a compelling argument for leveraging small language models in agent systems, showcasing their benefits in cost, performance, and reliability for enterprises gearing up for a future dominated by AI-driven automation.

1 2 2025 3 5 a accuracy Act action models actions addresses ads age agent agent architecture agent behavior agent interaction agent interactions agent system agent systems agentic agentic systems agents AGI AI AI systems analog and API app Application applications Arch architected architectural architecture architectures art as at ated Auto automation Behavior being benefits Bi brute building built by C calling capabilities capability capacity cell CERN chain challenge challenges CI CIA Claude Claude Code CleaR Cloud co code command companies complex reasoning computation computational capacity concerns cost cost efficiency cost-effective cost-effectiveness Costs CRM cross Current D de deployment deployment models deployments design deterministic deterministic interfaces development drive driven driven automation e effective effectiveness efficiency efficient election elections enterprise enterprise deployment enterprise deployments enterprise environments enterprise software enterprises environment ERP error error correction error handling error management error rate errors event Excel execution exp experience experimentation face fail fast feature for foundation model foundation models full future future of AI g Gen H handling high high-performance Highlight HR http HTTPS hybrid hybrid approach hybrid architecture Hybrid Architectures implications in Inference inference costs insights Instance Intel intelligence inter interaction interactions interface Interfaces io Iron ite J Just k Key l Lance language language model language models large Large Action Model large models led Li liability limitations Link llm lm local local inference local models logic long low M mac man management math mathematics Matrix mean mini mistakes Mode model models multi my N new news NIST no nomic NSA Nvidia o of off offs on one ons OPM opt optimization orchestration organization organizations oS out Outlook over paper parameter per performance performance benefits performance optimization play point Power powered practical applications pre pro problem product production production environment production environments products ps Q question QUIC R rag Raise rate Ray RCE re ready reasoning red release reliability report right Ro robustness Role s s pattern sales Salesforce scalable Scale SHA shift side Sig Sim Simple single skills small small action models small language models small models smaller models software software companies software deployment source space specific SSE STAR start startup startups state strategic system system design systems T Task tasks tech technical ted test Testing text the to tool tool orchestration tools TP trade two UI up ups US use uth V val Valid Vantage version Vision Ware Well Wi x z zero