AI systems – Page 73 – Experimental News Clipping Site

Simon Willison’s Weblog: Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Feb 25, 2025

—

by

Source URL: https://simonwillison.net/2025/Feb/25/emergent-misalignment/ Source: Simon Willison’s Weblog Title: Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs Feedly Summary: In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts…

Hacker News: Narrow finetuning can produce broadly misaligned LLM [pdf]

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://martins1612.github.io/emergent_misalignment_betley.pdf Source: Hacker News Title: Narrow finetuning can produce broadly misaligned LLM [pdf] Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The document presents findings on the phenomenon of “emergent misalignment” in large language models (LLMs) like GPT-4o when finetuned on specific narrow tasks, particularly the creation of insecure code. The results…

OpenAI : Deep research System Card

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/deep-research-system-card Source: OpenAI Title: Deep research System Card Feedly Summary: This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas. AI Summary and Description:…

Slashdot: Google Makes Gemini Code Assist Free

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://tech.slashdot.org/story/25/02/25/1640216/google-makes-gemini-code-assist-free?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Google Makes Gemini Code Assist Free Feedly Summary: AI Summary and Description: Yes Summary: Google has introduced a free version of its Gemini Code Assist, designed for developers with significantly higher usage limits compared to competitors like GitHub Copilot. This advancement emphasizes the growing trend of AI integration in…

Simon Willison’s Weblog: Leaked Windsurf prompt

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/25/leaked-windsurf-prompt/ Source: Simon Willison’s Weblog Title: Leaked Windsurf prompt Feedly Summary: Leaked Windsurf prompt The Windurf Editor is Codeium’s highly regarded entrant into the fork-of-VS-code AI-enhanced IDE model first pioneered by Cursor (and by VS Code itself). I heard online that it had a quirky system prompt, and was able to replicate that…

Slashdot: Call of Duty Maker Activision Admits To Using AI

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/02/25/1614220/call-of-duty-maker-activision-admits-to-using-ai?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Call of Duty Maker Activision Admits To Using AI Feedly Summary: AI Summary and Description: Yes Summary: Activision has confirmed the use of AI-generated content in its games, specifically in the Call of Duty franchise, which aligns with growing trends in the gaming industry where generative AI plays a…

Hacker News: DeepSearcher: A Local open-source Deep Research

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://milvus.io/blog/introduce-deepsearcher-a-local-open-source-deep-research.md Source: Hacker News Title: DeepSearcher: A Local open-source Deep Research Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The provided text outlines the development and functionality of DeepSearcher, an open-source research agent that automates query decomposition, data retrieval, and synthesis of information into detailed reports. It showcases innovations in AI-driven research…

Cisco Security Blog: AI Threat Intelligence Roundup: February 2025

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://blogs.cisco.com/security/ai-threat-intelligence-roundup-february-2025 Source: Cisco Security Blog Title: AI Threat Intelligence Roundup: February 2025 Feedly Summary: AI threat research is a fundamental part of Cisco’s approach to AI security. Our roundups highlight new findings from both original and third-party sources. AI Summary and Description: Yes Summary: The text emphasizes Cisco’s commitment to AI security through…

OpenAI : Estonia and OpenAI to bring ChatGPT to schools nationwide

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/estonia-schools-and-chatgpt Source: OpenAI Title: Estonia and OpenAI to bring ChatGPT to schools nationwide Feedly Summary: Estonia and OpenAI to bring ChatGPT to schools nationwide. OpenAI will work with the Estonian Government to provide students and teachers in the secondary school system with access to ChatGPT Edu. AI Summary and Description: Yes Summary: The…

The Register: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/02/25/chain_of_thought_jailbreaking/ Source: The Register Title: How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit Feedly Summary: Blueprints shared for jail-breaking models that expose their chain-of-thought process Analysis AI models like OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking can mimic human reasoning through a process called chain of thought.……

Tag: AI systems