Microsoft Security Blog: New whitepaper outlines the taxonomy of failure modes in AI agents

Apr 24, 2025

—

Source URL: https://www.microsoft.com/en-us/security/blog/2025/04/24/new-whitepaper-outlines-the-taxonomy-of-failure-modes-in-ai-agents/
Source: Microsoft Security Blog
Title: New whitepaper outlines the taxonomy of failure modes in AI agents

Feedly Summary: Read the new whitepaper from the Microsoft AI Red Team to better understand the taxonomy of failure mode in agentic AI.
The post New whitepaper outlines the taxonomy of failure modes in AI agents appeared first on Microsoft Security Blog.

AI Summary and Description: Yes

**Summary:** The text discusses Microsoft’s release of a new whitepaper that outlines a taxonomy of failure modes in AI agents, which is designed to help security professionals and machine learning engineers understand and mitigate potential risks associated with AI systems. This effort is a continuation of Microsoft’s commitment to improving safety and security in AI through the identification of unique and existing failure modes in agentic AI systems.

**Detailed Description:**

Microsoft’s new whitepaper detailing the taxonomy of failure modes in agentic AI systems serves as a crucial resource for professionals in AI security, helping them to think systematically about how these systems can fail.

Key points in the text include:

– **Background and Context:**
– The taxonomy builds on previous works of the Microsoft AI Red Team, which documented traditional AI system failure modes.
– Key collaborative efforts include partnerships with MITRE and various organizations to develop an Adversarial ML Threat Matrix, now evolved into MITRE ATLAS™.

– **Three-Prong Approach to Development:**
– Internal red teaming of Microsoft’s agent-based AI systems.
– Collaboration with internal stakeholders across Microsoft to refine the taxonomy.
– Interviews with external practitioners to ensure the taxonomy addresses real-world challenges.

– **Focus on Real-World Applications:**
– The whitepaper provides a case study illustrating how a cyber attacker could exploit an agent’s memory—demonstrating the practical implications of security failures.

– **Failure Modes Classification:**
– Two major categories are identified:
– **Security Failures:** Impact confidentiality, availability, or integrity (e.g., modifying the AI’s intent).
– **Safety Failures:** Affect responsible AI implementation and can harm users or society (e.g., biased service delivery).

– **Mapping Failures:**
– Novel failure modes are unique to agentic AI systems.
– Existing failure modes are familiar but are more significant in the context of agentic AI.

– **Mitigation Strategies and Controls:**
– Strategies against memory poisoning, including:
– External authentication for memory updates.
– Restricting access to memory components.
– Defining structured formats for memory storage.

– **Guidance for Various Stakeholders:**
– **Engineers:** Use the taxonomy for designing agents and enhancing security development practices.
– **Security Professionals:** Use it to assess and probe AI systems before launch, informing defensive strategies.
– **Governance Professionals:** Gain insight into both novel and traditional failure modes prevalent in agentic AI.

– **Future Updates:**
– The taxonomy is positioned as a living document, with the expectation of revisions as technology and threats evolve.

This whitepaper is particularly relevant for engineers, security professionals, and risk governance teams as it provides a systematic approach to identifying and mitigating risks in AI systems, reinforcing the importance of safety and security in the development of such technologies.

2 2025 24 4 5 a access Act addresses adversarial agent agentic AI agents AI AI implementation AI security AI systems ai-agents and app Application applications Aria ARM art as attack authentication availability Axon based bias C challenges CI CIA class classification co Col collaboration collaborative collaborative efforts commit confidentiality Context control controls cross cyber cyber attack D de Defensive Strategies DeFi demo design development development practices document e Engineer engineers exp exploit External fail failure modes failures fine first for future g Gen Go governance governance professionals guidance H harm HR http HTTPS implementation implications in integrity intent inter intern IRS ite J k Key l Labor learning led Li Living M mac machine Machine Learning Matrix memory memory components Micro Microsoft Microsoft Security Microsoft Security Blog mitigating risks mitigation mitigation strategies ML Mode ModI my N no o of on one OPM organization organizations ory out over partnership partnerships point post potential potential risks practical implications pre professionals Q R rag rate RCE real real-world applications red red team red teaming release resource responsible Responsible AI revision Risk risks Ro RSA s s Position safe safety sec security security failures security professionals service service delivery Sig SoC society source SSE SSO stakeholders storage structured structured formats study system systems T taxonomy team teaming Teams tech technologies technology text the threat threats to Tor TP two UI under up update updates US use user Users uth V val Vision white Wi world world applications x