Tag: classifiers
-
Google Online Security Blog: Mitigating prompt injection attacks with a layered defense strategy
Source URL: http://security.googleblog.com/2025/06/mitigating-prompt-injection-attacks.html Source: Google Online Security Blog Title: Mitigating prompt injection attacks with a layered defense strategy Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses emerging security threats associated with generative AI, particularly focusing on indirect prompt injections that manipulate AI systems through hidden malicious instructions. Google outlines its layered security…
-
Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks
Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…