AI security – Page 89 – Experimental News Clipping Site

Simon Willison’s Weblog: Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Feb 25, 2025

—

by

Source URL: https://simonwillison.net/2025/Feb/25/emergent-misalignment/ Source: Simon Willison’s Weblog Title: Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs Feedly Summary: In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts…

OpenAI : Deep research System Card

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/deep-research-system-card Source: OpenAI Title: Deep research System Card Feedly Summary: This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas. AI Summary and Description:…

Hacker News: Hard problems that reduce to document ranking

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://noperator.dev/posts/document-ranking-for-complex-problems/ Source: Hacker News Title: Hard problems that reduce to document ranking Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the innovative application of large language models (LLMs) in document ranking, particularly for locating vulnerabilities in code patches. It presents a novel approach to addressing complex security problems by…

Cisco Security Blog: AI Threat Intelligence Roundup: February 2025

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://blogs.cisco.com/security/ai-threat-intelligence-roundup-february-2025 Source: Cisco Security Blog Title: AI Threat Intelligence Roundup: February 2025 Feedly Summary: AI threat research is a fundamental part of Cisco’s approach to AI security. Our roundups highlight new findings from both original and third-party sources. AI Summary and Description: Yes Summary: The text emphasizes Cisco’s commitment to AI security through…

OpenAI : Estonia and OpenAI to bring ChatGPT to schools nationwide

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/estonia-schools-and-chatgpt Source: OpenAI Title: Estonia and OpenAI to bring ChatGPT to schools nationwide Feedly Summary: Estonia and OpenAI to bring ChatGPT to schools nationwide. OpenAI will work with the Estonian Government to provide students and teachers in the secondary school system with access to ChatGPT Edu. AI Summary and Description: Yes Summary: The…

Simon Willison’s Weblog: llm-anthropic 0.14

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/25/llm-anthropic-014/#atom-everything Source: Simon Willison’s Weblog Title: llm-anthropic 0.14 Feedly Summary: llm-anthropic 0.14 Annotated release notes for my new release of LLM. The signature feature is: Support for the new Claude 3.7 Sonnet model, including -o thinking 1 and -o thinking_budget X for extended reasoning mode. #14 I had a couple of attempts at…

The Register: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/02/25/chain_of_thought_jailbreaking/ Source: The Register Title: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit Feedly Summary: Blueprints shared for jail-breaking models that expose their chain-of-thought process Analysis AI models like OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking can mimic human reasoning through a process called chain of thought.……

Wired: Anthropic Launches the World’s First ‘Hybrid Reasoning’ AI Model

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.wired.com/story/anthropic-world-first-hybrid-reasoning-ai-model/ Source: Wired Title: Anthropic Launches the World’s First ‘Hybrid Reasoning’ AI Model Feedly Summary: Claude 3.7, the latest model from Anthropic, can be instructed to engage in a specific amount of reasoning to solve hard problems. AI Summary and Description: Yes Summary: The text discusses Claude 3.7, a new model from Anthropic,…

Hacker News: Claude 3.7 Sonnet and Claude Code

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.anthropic.com/news/claude-3-7-sonnet Source: Hacker News Title: Claude 3.7 Sonnet and Claude Code Feedly Summary: Comments AI Summary and Description: Yes Summary: The announcement details the launch of Claude 3.7 Sonnet, a significant advancement in AI models, touted as the first hybrid reasoning model capable of providing both instant responses and longer, more thoughtful outputs.…

Hacker News: AI cracks superbug problem in two days that took scientists years

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.bbc.com/news/articles/clyz6e9edy3o Source: Hacker News Title: AI cracks superbug problem in two days that took scientists years Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a remarkable achievement where an AI tool developed by Google was able to solve a complex scientific problem relating to antibiotic-resistant superbugs in just two…

Tag: AI security