safety – Page 47 – Experimental News Clipping Site

Hacker News: Narrow finetuning can produce broadly misaligned LLM [pdf]

Feb 25, 2025

—

by

Source URL: https://martins1612.github.io/emergent_misalignment_betley.pdf Source: Hacker News Title: Narrow finetuning can produce broadly misaligned LLM [pdf] Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The document presents findings on the phenomenon of “emergent misalignment” in large language models (LLMs) like GPT-4o when finetuned on specific narrow tasks, particularly the creation of insecure code. The results…

OpenAI : Deep research System Card

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/deep-research-system-card Source: OpenAI Title: Deep research System Card Feedly Summary: This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas. AI Summary and Description:…

Alerts: CISA Releases Two Industrial Control Systems Advisories

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.cisa.gov/news-events/alerts/2025/02/25/cisa-releases-two-industrial-control-systems-advisories Source: Alerts Title: CISA Releases Two Industrial Control Systems Advisories Feedly Summary: CISA released two Industrial Control Systems (ICS) advisories on February 25, 2025. These advisories provide timely information about current security issues, vulnerabilities, and exploits surrounding ICS. ICSA-25-056-01 Rockwell Automation PowerFlex 755 ICSMA-25-030-01 Contec Health CMS8000 Patient Monitor (Update A) CISA…

Schneier on Security: North Korean Hackers Steal $1.5B in Cryptocurrency

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.schneier.com/blog/archives/2025/02/north-korean-hackers-steal-1-5b-in-cryptocurrency.html Source: Schneier on Security Title: North Korean Hackers Steal $1.5B in Cryptocurrency Feedly Summary: It looks like a very sophisticated attack against the Dubai-based exchange Bybit: Bybit officials disclosed the theft of more than 400,000 ethereum and staked ethereum coins just hours after it occurred. The notification said the digital loot had…

OpenAI : Estonia and OpenAI to bring ChatGPT to schools nationwide

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/estonia-schools-and-chatgpt Source: OpenAI Title: Estonia and OpenAI to bring ChatGPT to schools nationwide Feedly Summary: Estonia and OpenAI to bring ChatGPT to schools nationwide. OpenAI will work with the Estonian Government to provide students and teachers in the secondary school system with access to ChatGPT Edu. AI Summary and Description: Yes Summary: The…

The Register: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/02/25/chain_of_thought_jailbreaking/ Source: The Register Title: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit Feedly Summary: Blueprints shared for jail-breaking models that expose their chain-of-thought process Analysis AI models like OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking can mimic human reasoning through a process called chain of thought.……

Hacker News: Grab AI Gateway: Connecting Grabbers to Multiple GenAI Providers

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://engineering.grab.com/grab-ai-gateway Source: Hacker News Title: Grab AI Gateway: Connecting Grabbers to Multiple GenAI Providers Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the implementation and significance of Grab’s AI Gateway, an integrated platform that facilitates access to multiple AI providers for users within the organization. It highlights the gateway’s…

Cloud Blog: Announcing Claude 3.7 Sonnet, Anthropic’s first hybrid reasoning model, is available on Vertex AI

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/anthropics-claude-3-7-sonnet-is-available-on-vertex-ai/ Source: Cloud Blog Title: Announcing Claude 3.7 Sonnet, Anthropic’s first hybrid reasoning model, is available on Vertex AI Feedly Summary: Today, we’re announcing Claude 3.7 Sonnet, Anthropic’s most intelligent model to date and the first hybrid reasoning model on the market, is available in preview on Vertex AI Model Garden. Claude 3.7…

Hacker News: Claude 3.7 Sonnet and Claude Code

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.anthropic.com/news/claude-3-7-sonnet Source: Hacker News Title: Claude 3.7 Sonnet and Claude Code Feedly Summary: Comments AI Summary and Description: Yes Summary: The announcement details the launch of Claude 3.7 Sonnet, a significant advancement in AI models, touted as the first hybrid reasoning model capable of providing both instant responses and longer, more thoughtful outputs.…

Bulletins: Vulnerability Summary for the Week of February 17, 2025

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.cisa.gov/news-events/bulletins/sb25-055 Source: Bulletins Title: Vulnerability Summary for the Week of February 17, 2025 Feedly Summary: High Vulnerabilities PrimaryVendor — Product Description Published CVSS Score Source Info a1post–A1POST.BG Shipping for Woo Cross-Site Request Forgery (CSRF) vulnerability in a1post A1POST.BG Shipping for Woo allows Privilege Escalation. This issue affects A1POST.BG Shipping for Woo: from n/a…

Tag: safety