Tag: misalignment
-
METR updates – METR: [ext, adv] 2025.03.05 Comment on AI Action Plan
Source URL: https://metr.org/METR_ai_action_plan_comment.pdf Source: METR updates – METR Title: [ext, adv] 2025.03.05 Comment on AI Action Plan Feedly Summary: AI Summary and Description: Yes Summary: The text discusses key considerations and priority actions for developing an Artificial Intelligence (AI) Action Plan by METR, a research nonprofit focused on AI systems and their risks to public…
-
The Register: The NHS security culture problem is a crisis years in the making
Source URL: https://www.theregister.com/2025/03/10/nhs_security_culture/ Source: The Register Title: The NHS security culture problem is a crisis years in the making Feedly Summary: Insiders say board members must be held accountable and drive positive change from the top down Analysis Walk into any hospital and ask the same question – “Which security system should we invest in?"…
-
Schneier on Security: “Emergent Misalignment” in LLMs
Source URL: https://www.schneier.com/blog/archives/2025/02/emergent-misalignment-in-llms.html Source: Schneier on Security Title: “Emergent Misalignment” in LLMs Feedly Summary: Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model…
-
The Register: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o
Source URL: https://www.theregister.com/2025/02/27/llm_emergent_misalignment_study/ Source: The Register Title: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o Feedly Summary: Model was fine-tuned to write vulnerable software – then suggested enslaving humanity Computer scientists have found that fine-tuning notionally safe large language models to do one thing badly can negatively…