Tag: model safety
-
CSA: DeepSeek: Behind the Hype and Headlines
Source URL: https://cloudsecurityalliance.org/blog/2025/03/25/deepseek-behind-the-hype-and-headlines Source: CSA Title: DeepSeek: Behind the Hype and Headlines Feedly Summary: AI Summary and Description: Yes **Summary:** The emergence of DeepSeek, a Chinese AI company claiming to rival industry giants like OpenAI and Google, has sparked dramatic market reactions and raised critical discussions around AI safety, intellectual property, and geopolitical implications. Despite…
-
The Register: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o
Source URL: https://www.theregister.com/2025/02/27/llm_emergent_misalignment_study/ Source: The Register Title: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o Feedly Summary: Model was fine-tuned to write vulnerable software – then suggested enslaving humanity Computer scientists have found that fine-tuning notionally safe large language models to do one thing badly can negatively…
-
Unit 42: Investigating LLM Jailbreaking of Popular Generative AI Web Products
Source URL: https://unit42.paloaltonetworks.com/jailbreaking-generative-ai-web-products/ Source: Unit 42 Title: Investigating LLM Jailbreaking of Popular Generative AI Web Products Feedly Summary: We discuss vulnerabilities in popular GenAI web products to LLM jailbreaks. Single-turn strategies remain effective, but multi-turn approaches show greater success. The post Investigating LLM Jailbreaking of Popular Generative AI Web Products appeared first on Unit 42.…
-
The Register: Voice-enabled AI agents can automate everything, even your phone scams
Source URL: https://www.theregister.com/2024/10/24/openai_realtime_api_phone_scam/ Source: The Register Title: Voice-enabled AI agents can automate everything, even your phone scams Feedly Summary: All for the low, low price of a mere dollar Scammers, rejoice. OpenAI’s real-time voice API can be used to build AI agents capable of conducting successful phone call scams for less than a dollar.… AI…
-
METR Blog – METR: BIS Comment Regarding "Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters"
Source URL: https://downloads.regulations.gov/BIS-2024-0047-0048/attachment_1.pdf Source: METR Blog – METR Title: BIS Comment Regarding "Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters" Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the Bureau of Industry and Security’s proposed reporting requirements for advanced AI models and computing clusters, emphasizing…
-
Hacker News: Sabotage Evaluations for Frontier Models
Source URL: https://www.anthropic.com/research/sabotage-evaluations Source: Hacker News Title: Sabotage Evaluations for Frontier Models Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text outlines a comprehensive series of evaluation techniques developed by the Anthropic Alignment Science team to assess potential sabotage capabilities in AI models. These evaluations are crucial for ensuring the safety and integrity…
-
The Register: Anthropic’s Claude vulnerable to ’emotional manipulation’
Source URL: https://www.theregister.com/2024/10/12/anthropics_claude_vulnerable_to_emotional/ Source: The Register Title: Anthropic’s Claude vulnerable to ’emotional manipulation’ Feedly Summary: AI model safety only goes so far Anthropic’s Claude 3.5 Sonnet, despite its reputation as one of the better behaved generative AI models, can still be convinced to emit racist hate speech and malware.… AI Summary and Description: Yes Summary:…