Tag: jailbreak

  • The Register: Rackspace moving some of its own workloads off VMware to address bigger Broadcom bills

    Source URL: https://www.theregister.com/2025/02/05/rackspace_vmware_planet9_migration/ Source: The Register Title: Rackspace moving some of its own workloads off VMware to address bigger Broadcom bills Feedly Summary: New home, Planet9, says it’s also helping a Fortune 500 company to migrate 40,000 VMs Exclusive Rackspace is moving some of its back-office workloads off VMware and onto a platform called Private…

  • Slashdot: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results

    Source URL: https://slashdot.org/story/25/02/03/1810255/anthropic-makes-jailbreak-advance-to-stop-ai-models-producing-harmful-results?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results Feedly Summary: AI Summary and Description: Yes Summary: Anthropic has introduced a new technique called “constitutional classifiers” designed to enhance the security of large language models (LLMs) like its Claude chatbot. This system aims to mitigate risks associated…

  • Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

  • Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…

  • Hacker News: Notes on OpenAI O3-Mini

    Source URL: https://simonwillison.net/2025/Jan/31/o3-mini/ Source: Hacker News Title: Notes on OpenAI O3-Mini Feedly Summary: Comments AI Summary and Description: Yes Summary: The announcement of OpenAI’s o3-mini model marks a significant development in the landscape of large language models (LLMs). With enhanced performance on specific benchmarks and user functionalities that include internet search capabilities, o3-mini aims to…

  • Simon Willison’s Weblog: OpenAI o3-mini, now available in LLM

    Source URL: https://simonwillison.net/2025/Jan/31/o3-mini/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI o3-mini, now available in LLM Feedly Summary: o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate – we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini or (if we have access) o1 Pro.…

  • Wired: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

    Source URL: https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/ Source: Wired Title: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot Feedly Summary: Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one. AI Summary and Description: Yes Summary: The text highlights the ongoing battle between hackers and security researchers…

  • Hacker News: O3-mini System Card [pdf]

    Source URL: https://cdn.openai.com/o3-mini-system-card.pdf Source: Hacker News Title: O3-mini System Card [pdf] Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The OpenAI o3-mini System Card details the advanced capabilities, safety evaluations, and risk classifications of the OpenAI o3-mini model. This document is particularly pertinent for professionals in AI security, as it outlines significant safety measures…

  • The Register: Google to Iran: Yes, we see you using Gemini for phishing and scripting. We’re onto you

    Source URL: https://www.theregister.com/2025/01/31/state_spies_google_gemini/ Source: The Register Title: Google to Iran: Yes, we see you using Gemini for phishing and scripting. We’re onto you Feedly Summary: And you, China, Russia, North Korea … Guardrails block malware generation Google says it’s spotted Chinese, Russian, Iranian, and North Korean government agents using its Gemini AI for nefarious purposes,…

  • Unit 42: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek

    Source URL: https://unit42.paloaltonetworks.com/?p=138180 Source: Unit 42 Title: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek Feedly Summary: Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. The post Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek appeared first on Unit 42. AI Summary and Description: Yes Summary: The text outlines the research conducted…