Tag: safety
-
The Register: LegalPwn: Tricking LLMs by burying badness in lawyerly fine print
Source URL: https://www.theregister.com/2025/09/01/legalpwn_ai_jailbreak/ Source: The Register Title: LegalPwn: Tricking LLMs by burying badness in lawyerly fine print Feedly Summary: Trust and believe – AI models trained to see ‘legal’ doc as super legit Researchers at security firm Pangea have discovered yet another way to trivially trick large language models (LLMs) into ignoring their guardrails. Stick…
-
Slashdot: OpenAI Is Scanning Users’ ChatGPT Conversations and Reporting Content To Police
Source URL: https://yro.slashdot.org/story/25/08/31/2311231/openai-is-scanning-users-chatgpt-conversations-and-reporting-content-to-police Source: Slashdot Title: OpenAI Is Scanning Users’ ChatGPT Conversations and Reporting Content To Police Feedly Summary: AI Summary and Description: Yes Summary: The text highlights OpenAI’s controversial practice of monitoring user conversations in ChatGPT for threats, revealing significant security and privacy implications. This admission raises questions about the balance between safety and…
-
Slashdot: One Long Sentence is All It Takes To Make LLMs Misbehave
Source URL: https://slashdot.org/story/25/08/27/1756253/one-long-sentence-is-all-it-takes-to-make-llms-misbehave?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: One Long Sentence is All It Takes To Make LLMs Misbehave Feedly Summary: AI Summary and Description: Yes Summary: The text discusses a significant security research finding from Palo Alto Networks’ Unit 42 regarding vulnerabilities in large language models (LLMs). The researchers explored methods that allow users to bypass…
-
OpenAI : OpenAI and Anthropic share findings from a joint safety evaluation
Source URL: https://openai.com/index/openai-anthropic-safety-evaluation Source: OpenAI Title: OpenAI and Anthropic share findings from a joint safety evaluation Feedly Summary: OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration. AI Summary and Description: Yes Summary:…