jailbreaking techniques – Experimental News Clipping Site

Unit 42: Investigating LLM Jailbreaking of Popular Generative AI Web Products

Feb 21, 2025

—

by

Source URL: https://unit42.paloaltonetworks.com/jailbreaking-generative-ai-web-products/ Source: Unit 42 Title: Investigating LLM Jailbreaking of Popular Generative AI Web Products Feedly Summary: We discuss vulnerabilities in popular GenAI web products to LLM jailbreaks. Single-turn strategies remain effective, but multi-turn approaches show greater success. The post Investigating LLM Jailbreaking of Popular Generative AI Web Products appeared first on Unit 42.…

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…

Unit 42: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek

Jan 30, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://unit42.paloaltonetworks.com/?p=138180 Source: Unit 42 Title: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek Feedly Summary: Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. The post Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek appeared first on Unit 42. AI Summary and Description: Yes Summary: The text outlines the research conducted…

Tag: jailbreaking techniques

Unit 42: Investigating LLM Jailbreaking of Popular Generative AI Web Products

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

Unit 42: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek