Tag: jailbreaks

  • Cloud Blog: Announcing AI Protection: Security for the AI era

    Source URL: https://cloud.google.com/blog/products/identity-security/introducing-ai-protection-security-for-the-ai-era/ Source: Cloud Blog Title: Announcing AI Protection: Security for the AI era Feedly Summary: As AI use increases, security remains a top concern, and we often hear that organizations are worried about risks that can come with rapid adoption. Google Cloud is committed to helping our customers confidently build and deploy AI…

  • Unit 42: Investigating LLM Jailbreaking of Popular Generative AI Web Products

    Source URL: https://unit42.paloaltonetworks.com/jailbreaking-generative-ai-web-products/ Source: Unit 42 Title: Investigating LLM Jailbreaking of Popular Generative AI Web Products Feedly Summary: We discuss vulnerabilities in popular GenAI web products to LLM jailbreaks. Single-turn strategies remain effective, but multi-turn approaches show greater success. The post Investigating LLM Jailbreaking of Popular Generative AI Web Products appeared first on Unit 42.…

  • Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

  • Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…

  • Wired: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

    Source URL: https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/ Source: Wired Title: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot Feedly Summary: Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one. AI Summary and Description: Yes Summary: The text highlights the ongoing battle between hackers and security researchers…

  • Unit 42: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek

    Source URL: https://unit42.paloaltonetworks.com/?p=138180 Source: Unit 42 Title: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek Feedly Summary: Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. The post Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek appeared first on Unit 42. AI Summary and Description: Yes Summary: The text outlines the research conducted…

  • Hacker News: 1,156 Questions Censored by DeepSeek

    Source URL: https://www.promptfoo.dev/blog/deepseek-censorship/ Source: Hacker News Title: 1,156 Questions Censored by DeepSeek Feedly Summary: Comments AI Summary and Description: Yes **Summary**: The text discusses the DeepSeek-R1 model, highlighting its prominence and the associated concerns regarding censorship driven by CCP policies. It emphasizes the model’s high refusal rate on sensitive topics in China, the methods to…

  • Simon Willison’s Weblog: ChatGPT Operator system prompt

    Source URL: https://simonwillison.net/2025/Jan/26/chatgpt-operator-system-prompt/#atom-everything Source: Simon Willison’s Weblog Title: ChatGPT Operator system prompt Feedly Summary: ChatGPT Operator system prompt Johann Rehberger snagged a copy of the ChatGPT Operator system prompt. As usual, the system prompt doubles as better written documentation than any of the official sources. It asks users for confirmation a lot: ## Confirmations Ask…

  • Simon Willison’s Weblog: Introducing Operator

    Source URL: https://simonwillison.net/2025/Jan/23/introducing-operator/ Source: Simon Willison’s Weblog Title: Introducing Operator Feedly Summary: Introducing Operator OpenAI released their “research preview" today of Operator, a cloud-based browser automation platform rolling out today to $200/month ChatGPT Pro subscribers. They’re calling this their first "agent". In the Operator announcement video Sam Altman defined that notoriously vague term like this:…