Tag: jailbreak
-
Simon Willison’s Weblog: ChatGPT Operator system prompt
Source URL: https://simonwillison.net/2025/Jan/26/chatgpt-operator-system-prompt/#atom-everything Source: Simon Willison’s Weblog Title: ChatGPT Operator system prompt Feedly Summary: ChatGPT Operator system prompt Johann Rehberger snagged a copy of the ChatGPT Operator system prompt. As usual, the system prompt doubles as better written documentation than any of the official sources. It asks users for confirmation a lot: ## Confirmations Ask…
-
Simon Willison’s Weblog: Introducing Operator
Source URL: https://simonwillison.net/2025/Jan/23/introducing-operator/ Source: Simon Willison’s Weblog Title: Introducing Operator Feedly Summary: Introducing Operator OpenAI released their “research preview" today of Operator, a cloud-based browser automation platform rolling out today to $200/month ChatGPT Pro subscribers. They’re calling this their first "agent". In the Operator announcement video Sam Altman defined that notoriously vague term like this:…
-
OpenAI : Operator System Card
Source URL: https://openai.com/index/operator-system-card Source: OpenAI Title: Operator System Card Feedly Summary: Drawing from OpenAI’s established safety frameworks, this document highlights our multi-layered approach, including model and product mitigations we’ve implemented to protect against prompt engineering and jailbreaks, protect privacy and security, as well as details our external red teaming efforts, safety evaluations, and ongoing work…
-
The Register: Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling
Source URL: https://www.theregister.com/2025/01/17/nvidia_cisco_ai_guardrails_security/ Source: The Register Title: Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling Feedly Summary: Some of you have apparently already botched chatbots or allowed ‘shadow AI’ to creep in Cisco and Nvidia have both recognized that as useful as today’s AI may be,…
-
Slashdot: New LLM Jailbreak Uses Models’ Evaluation Skills Against Them
Source URL: https://it.slashdot.org/story/25/01/12/2010218/new-llm-jailbreak-uses-models-evaluation-skills-against-them?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: New LLM Jailbreak Uses Models’ Evaluation Skills Against Them Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses a novel jailbreak technique for large language models (LLMs) known as the ‘Bad Likert Judge,’ which exploits the models’ evaluative capabilities to generate harmful content. Developed by Palo Alto…
-
Hacker News: Human study on AI spear phishing campaigns
Source URL: https://www.lesswrong.com/posts/GCHyDKfPXa5qsG2cP/human-study-on-ai-spear-phishing-campaigns Source: Hacker News Title: Human study on AI spear phishing campaigns Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a study evaluating the effectiveness of AI models in executing personalized phishing attacks, revealing a disturbing increase in the capabilities of AI-generated spear phishing. The findings indicate high click-through…
-
Slashdot: Dire Predictions for 2025 Include ‘Largest Cyberattack in History’
Source URL: https://it.slashdot.org/story/25/01/04/1839246/dire-predictions-for-2025-include-largest-cyberattack-in-history?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Dire Predictions for 2025 Include ‘Largest Cyberattack in History’ Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses potential “Black Swan” events for 2025, particularly highlighting the anticipated risks associated with cyberattacks bolstered by generative AI and large language models. This insight is crucial for security professionals,…
-
Unit 42: Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
Source URL: https://unit42.paloaltonetworks.com/?p=138017 Source: Unit 42 Title: Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability Feedly Summary: The jailbreak technique “Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The post Bad Likert Judge: A Novel Multi-Turn Technique to…
-
Schneier on Security: Jailbreaking LLM-Controlled Robots
Source URL: https://www.schneier.com/blog/archives/2024/12/jailbreaking-llm-controlled-robots.html Source: Schneier on Security Title: Jailbreaking LLM-Controlled Robots Feedly Summary: Surprising no one, it’s easy to trick an LLM-controlled robot into ignoring its safety instructions. AI Summary and Description: Yes Summary: The text highlights a significant vulnerability in LLM-controlled robots, revealing that they can be manipulated to bypass their safety protocols. This…