The Cloudflare Blog: Block unsafe prompts targeting your LLM endpoints with Firewall for AI

Aug 26, 2025

—

Source URL: https://blog.cloudflare.com/block-unsafe-llm-prompts-with-firewall-for-ai/
Source: The Cloudflare Blog
Title: Block unsafe prompts targeting your LLM endpoints with Firewall for AI

Feedly Summary: Cloudflare’s AI security suite now includes unsafe content moderation, integrated into the Application Security Suite via Firewall for AI.

AI Summary and Description: Yes

Summary: The text discusses the launch of Cloudflare’s Firewall for AI, an integrated feature designed to address emerging security risks associated with AI-powered applications, particularly Large Language Models (LLMs). This offering provides real-time protection against potentially harmful prompts and content, emphasizing the importance of proactive moderation to ensure user trust and safety.

Detailed Description:

The text highlights several significant points regarding the integration of AI security measures, particularly in the context of the growing use of LLMs:

– **Emerging Risks in AI Applications**:
– AI-powered applications, such as chatbots and search assistants, are expanding but introduce new security risks.
– Malicious prompts can compromise models, leading to data exfiltration and content poisoning.

– **Cloudflare’s Firewall for AI**:
– This feature provides unsafe content moderation by leveraging Llama Guard, which allows customers to apply consistent security measures across various LLM implementations, regardless of whether the models are custom-built or sourced from third parties (e.g., OpenAI).
– Firewall for AI enables security teams to define guardrails that are applied uniformly, reducing the burden of maintenance on diverse applications.

– **Addressing the OWASP Top 10 Risks**:
– The firewall specifically targets risks associated with LLMs, such as prompt injection, Personally Identifiable Information (PII) disclosure, and the spread of harmful content.
– It is designed to meet legal obligations and protect brand integrity by preventing the misuse of AI via robust moderation systems.

– **Real-time Prompt Moderation**:
– Llama Guard assesses prompts in real time, categorizing them based on safety and unsafe content indicators.
– It addresses the variability and unpredictability of human interactions with AI, optimizing moderation efforts without sacrificing utility.

– **Scalable Infrastructure**:
– The architecture of Firewall for AI is built to scale dynamically, ensuring performance does not degrade as usage increases.
– A new asynchronous model allows multiple detection modules to operate simultaneously, maintaining high performance even with intensive workloads.

– **Enforcement and Analytics**:
– Security and application teams can manage and enforce safety rules directly within the platform, allowing for extensive oversight without compromising user experience.
– Detailed analytics provide insights into the nature of flagged prompts, informing ongoing improvements in AI safety measures.

– **Future Developments**:
– Cloudflare plans to enhance Firewall for AI capabilities, focusing on improved detection of prompt injection and enhanced visibility within analytics.
– The text indicates a user research initiative to gather feedback and shape the development of AI security features.

This offering underscores the increasing necessity for security frameworks that adapt to new challenges posed by AI technologies, marking a crucial step for organizations prioritizing user safety while leveraging innovative AI applications.

1 10 a Act actions addresses ads age AGI AI AI applications AI capabilities AI safety AI security AI technologies All analytics and app Application application security applications Arch architecture Aria ARM art as assistant assistants async asynchronous at ated based Bi bots built by C capabilities challenge challenges chat Chatbot Chatbots CI CIA Cloud Cloudflare co content content moderation Context core cross custom Customer D data data exfiltration de DeFi design detection development developments disclosure e emerging emerging risks end endpoint endpoints enforcement event exfiltration exp experience feature features feedback fine firewall for framework frameworks future future developments g Go grade Guardrails H harm harmful content high Highlight HR http HTTPS human Human Interaction human interactions implementation in indicators information infrastructure injection insights integration integrity intensive intensive workloads inter interaction interactions io ite J k l language language model language models large large language model large language models Large Language Models (LLMs) leading led Legal legal obligations Li llama llm llms lm load low M maintenance man measures misuse ML Mode model models moderation moderation systems multi N new NGO no NSA o oE of off on ons open openai OPM opt organization organizations oS oss out over oversight OWASP per performance personally identifiable information Personally Identifiable Information (PII) platform point potential Power powered pre pro proactive prompt prompt-injection prompts protection ps R rag rate RCE re real real-time red research Risk risks RMF Ro RoT row Rust s safe safety safety measures scalable scalable infrastructure Scale search sec security security features security framework security frameworks security measure security measures security risk security risks security suite security team security teams SHA Sig Sim SoC source specific SSE SSO SUSE system systems T team Teams tech technologies ted text the third third parties Time to Top 10 Tor TP trust UI under unpredictability US usage use user user experience user research user safety user trust V variability visibility Wi workload workloads x yt z