Tag: safeguards
-
Hacker News: Constitutional Classifiers: Defending against universal jailbreaks
Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…
-
Slashdot: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts
Source URL: https://slashdot.org/story/25/02/02/0319217/openai-tests-its-ais-persuasiveness-by-comparing-it-to-reddit-posts?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts Feedly Summary: AI Summary and Description: Yes Summary: OpenAI utilized the subreddit r/ChangeMyView to test and evaluate the persuasive capabilities of its AI reasoning models, particularly through a structured process that involves comparing AI-generated responses with human replies.…
-
Wired: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot
Source URL: https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/ Source: Wired Title: DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot Feedly Summary: Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one. AI Summary and Description: Yes Summary: The text highlights the ongoing battle between hackers and security researchers…
-
The Register: The curious story of Uncle Sam’s HR dept, a hastily set up email server, and fears of another cyber disaster
Source URL: https://www.theregister.com/2025/01/29/opm_email_lawsuit/ Source: The Register Title: The curious story of Uncle Sam’s HR dept, a hastily set up email server, and fears of another cyber disaster Feedly Summary: Lawsuit challenges effort to create federal-wide centralized inbox expected to be used for mass firings Two anonymous US government employees have sued Uncle Sam’s HR department…
-
Data and computer security | The Guardian: Threat of cyber-attacks on Whitehall ‘is severe and advancing quickly’, NAO says
Source URL: https://www.theguardian.com/technology/2025/jan/29/cyber-attack-threat-uk-government-departments-whitehall-nao Source: Data and computer security | The Guardian Title: Threat of cyber-attacks on Whitehall ‘is severe and advancing quickly’, NAO says Feedly Summary: Audit watchdog finds 58 critical IT systems assessed in 2024 had ‘significant gaps in cyber-resilience’The threat of potentially devastating cyber-attacks against UK government departments is “severe and advancing quickly”,…
-
The Register: Mental toll: Scale AI, Outlier sued by humans paid to steer AI away from our darkest depths
Source URL: https://www.theregister.com/2025/01/24/scale_ai_outlier_sued_over/ Source: The Register Title: Mental toll: Scale AI, Outlier sued by humans paid to steer AI away from our darkest depths Feedly Summary: Who guards the guardrail makers? Not the bosses who hire them, it’s alleged Scale AI, which labels training data for machine-learning models, was sued this month, alongside labor platform…