harmful content – Page 5 – Experimental News Clipping Site

Slashdot: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results

Feb 3, 2025

—

by

Source URL: https://slashdot.org/story/25/02/03/1810255/anthropic-makes-jailbreak-advance-to-stop-ai-models-producing-harmful-results?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Anthropic Makes ‘Jailbreak’ Advance To Stop AI Models Producing Harmful Results Feedly Summary: AI Summary and Description: Yes Summary: Anthropic has introduced a new technique called “constitutional classifiers” designed to enhance the security of large language models (LLMs) like its Claude chatbot. This system aims to mitigate risks associated…

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

Unit 42: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek

Jan 30, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://unit42.paloaltonetworks.com/?p=138180 Source: Unit 42 Title: Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek Feedly Summary: Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. The post Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek appeared first on Unit 42. AI Summary and Description: Yes Summary: The text outlines the research conducted…

The Register: Mental toll: Scale AI, Outlier sued by humans paid to steer AI away from our darkest depths

Jan 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/01/24/scale_ai_outlier_sued_over/ Source: The Register Title: Mental toll: Scale AI, Outlier sued by humans paid to steer AI away from our darkest depths Feedly Summary: Who guards the guardrail makers? Not the bosses who hire them, it’s alleged Scale AI, which labels training data for machine-learning models, was sued this month, alongside labor platform…

Hacker News: Malicious extensions circumvent Google’s remote code ban

Jan 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://palant.info/2025/01/20/malicious-extensions-circumvent-googles-remote-code-ban/ Source: Hacker News Title: Malicious extensions circumvent Google’s remote code ban Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses security vulnerabilities related to malicious browser extensions in the Chrome Web Store, focusing on how they can execute remote code and compromise user privacy. It critiques Google’s policies regarding…

Hacker News: Under new law, cops bust famous cartoonist for AI-generated CSAM

Jan 18, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://arstechnica.com/tech-policy/2025/01/under-new-law-cops-bust-famous-cartoonist-for-ai-generated-child-sex-abuse-images/ Source: Hacker News Title: Under new law, cops bust famous cartoonist for AI-generated CSAM Feedly Summary: Comments AI Summary and Description: Yes Summary: This text discusses California’s recently enacted law targeting AI-generated child sex abuse material (CSAM), emphasizing the unique risks associated with AI in this context and the implications for child…

Cloud Blog: Bitly: Protecting users from malicious links with Web Risk

Jan 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/partners/bitly-ensuring-real-time-link-safety-with-web-risk-to-protect-people/ Source: Cloud Blog Title: Bitly: Protecting users from malicious links with Web Risk Feedly Summary: Bitly’s partnership with Google Web Risk helps enhance Bitly’s ability to protect users and build trust as they generate millions of links and QR Codes daily. Over the last decade, Bitly has solidified its reputation as a…

The Register: Microsoft sues ‘foreign-based’ criminals, seizes sites used to abuse AI

Jan 13, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/01/13/microsoft_sues_foreignbased_crims_seizes/ Source: The Register Title: Microsoft sues ‘foreign-based’ criminals, seizes sites used to abuse AI Feedly Summary: Crooks stole API keys, then started a hacking-as-a-service biz Microsoft has sued a group of unnamed cybercriminals who developed tools to bypass safety guardrails in its generative AI tools. The tools were used to create harmful…

Schneier on Security: Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Jan 13, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.schneier.com/blog/archives/2025/01/microsoft-takes-legal-action-against-ai-hacking-as-a-service-scheme.html Source: Schneier on Security Title: Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme Feedly Summary: Not sure this will matter in the end, but it’s a positive move: Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit…

Slashdot: New LLM Jailbreak Uses Models’ Evaluation Skills Against Them

Jan 12, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://it.slashdot.org/story/25/01/12/2010218/new-llm-jailbreak-uses-models-evaluation-skills-against-them?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: New LLM Jailbreak Uses Models’ Evaluation Skills Against Them Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses a novel jailbreak technique for large language models (LLMs) known as the ‘Bad Likert Judge,’ which exploits the models’ evaluative capabilities to generate harmful content. Developed by Palo Alto…

Tag: harmful content