classifiers – Experimental News Clipping Site

Google Online Security Blog: Mitigating prompt injection attacks with a layered defense strategy

Jun 13, 2025

—

by

Source URL: http://security.googleblog.com/2025/06/mitigating-prompt-injection-attacks.html Source: Google Online Security Blog Title: Mitigating prompt injection attacks with a layered defense strategy Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses emerging security threats associated with generative AI, particularly focusing on indirect prompt injections that manipulate AI systems through hidden malicious instructions. Google outlines its layered security…

Simon Willison’s Weblog: Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot

Jun 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/11/echoleak/ Source: Simon Willison’s Weblog Title: Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot Feedly Summary: Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot Aim Labs reported CVE-2025-32711 against Microsoft 365 Copilot back in January, and the fix is…

Simon Willison’s Weblog: What’s new in the world of LLMs, for NICAR 2025

Mar 8, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/8/nicar-llms/ Source: Simon Willison’s Weblog Title: What’s new in the world of LLMs, for NICAR 2025 Feedly Summary: I presented two sessions at the NICAR 2025 data journalism conference this year. The first was this one based on my review of LLMs in 2024, extended by several months to cover everything that’s happened…

Slashdot: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/02/03/1810255/anthropic-makes-jailbreak-advance-to-stop-ai-models-producing-harmful-results?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results Feedly Summary: AI Summary and Description: Yes Summary: Anthropic has introduced a new technique called “constitutional classifiers” designed to enhance the security of large language models (LLMs) like its Claude chatbot. This system aims to mitigate risks associated…

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

Tag: classifiers

Google Online Security Blog: Mitigating prompt injection attacks with a layered defense strategy

Simon Willison’s Weblog: Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot

Simon Willison’s Weblog: What’s new in the world of LLMs, for NICAR 2025

Slashdot: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks