jailbreaking attacks – Experimental News Clipping Site

Simon Willison’s Weblog: The lethal trifecta for AI agents: private data, untrusted content, and external communication

Jun 16, 2025

—

by

Source URL: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/#atom-everything Source: Simon Willison’s Weblog Title: The lethal trifecta for AI agents: private data, untrusted content, and external communication Feedly Summary: If you are a user of LLM systems that use tools (you can call them “AI agents" if you like) it is critically important that you understand the risk of combining tools…

Simon Willison’s Weblog: Highlights from the Claude 4 system prompt

May 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/25/claude-4-system-prompt/ Source: Simon Willison’s Weblog Title: Highlights from the Claude 4 system prompt Feedly Summary: Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts,…

Simon Willison’s Weblog: ChatGPT Operator system prompt

Jan 26, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jan/26/chatgpt-operator-system-prompt/#atom-everything Source: Simon Willison’s Weblog Title: ChatGPT Operator system prompt Feedly Summary: ChatGPT Operator system prompt Johann Rehberger snagged a copy of the ChatGPT Operator system prompt. As usual, the system prompt doubles as better written documentation than any of the official sources. It asks users for confirmation a lot: ## Confirmations Ask…

Hacker News: Robot Jailbreak: Researchers Trick Bots into Dangerous Tasks

Nov 24, 2024

—

by

system automation

in Uncategorized

Source URL: https://spectrum.ieee.org/jailbreak-llm Source: Hacker News Title: Robot Jailbreak: Researchers Trick Bots into Dangerous Tasks Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses significant security vulnerabilities associated with large language models (LLMs) used in robotic systems, revealing how easily these systems can be “jailbroken” to perform harmful actions. This raises pressing…

Hacker News: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Nov 17, 2024

—

by

system automation

in Uncategorized

Source URL: https://arxiv.org/abs/2310.03684 Source: Hacker News Title: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Feedly Summary: Comments AI Summary and Description: Yes Summary: This text presents “SmoothLLM,” an innovative algorithm designed to enhance the security of Large Language Models (LLMs) against jailbreaking attacks, which manipulate models into producing undesirable content. The proposal highlights a…

Tag: jailbreaking attacks

Simon Willison’s Weblog: The lethal trifecta for AI agents: private data, untrusted content, and external communication

Simon Willison’s Weblog: Highlights from the Claude 4 system prompt

Simon Willison’s Weblog: ChatGPT Operator system prompt

Hacker News: Robot Jailbreak: Researchers Trick Bots into Dangerous Tasks

Hacker News: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks