AI behavior – Page 5 – Experimental News Clipping Site

Schneier on Security: Regulating AI Behavior with a Hypervisor

Apr 23, 2025

—

by

Source URL: https://www.schneier.com/blog/archives/2025/04/regulating-ai-behavior-with-a-hypervisor.html Source: Schneier on Security Title: Regulating AI Behavior with a Hypervisor Feedly Summary: Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.” Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a…

Wired: Amazon’s AGI Lab Reveals Its First Work: Advanced AI Agents

Mar 31, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.wired.com/story/amazon-ai-agents-nova-web-browsing/ Source: Wired Title: Amazon’s AGI Lab Reveals Its First Work: Advanced AI Agents Feedly Summary: Led by a former OpenAI executive, Amazon’s AI lab focuses on the decision-making capabilities of next generation of software agents—and borrows insights from physical robots. AI Summary and Description: Yes Summary: Amazon is making strides in artificial…

Wired: Anthropic’s Claude Is Good at Poetry—and Bullshitting

Mar 28, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.wired.com/story/plaintext-anthropic-claude-brain-research/ Source: Wired Title: Anthropic’s Claude Is Good at Poetry—and Bullshitting Feedly Summary: Researchers looked inside the chatbot’s “brain.” The results were surprisingly chilling. AI Summary and Description: Yes Summary: The text discusses the challenges researchers face in describing Anthropic’s large language model, Claude, while avoiding anthropomorphism. The release of new papers highlights…

Wired: An AI Coding Assistant Refused to Write Code—and Suggested the User Learn to Do It Himself

Mar 15, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://arstechnica.com/ai/2025/03/ai-coding-assistant-refuses-to-write-code-tells-user-to-learn-programming-instead/ Source: Wired Title: An AI Coding Assistant Refused to Write Code—and Suggested the User Learn to Do It Himself Feedly Summary: The old “teach a man to fish” proverb, but for AI chatbots. AI Summary and Description: Yes Summary: The text discusses a notable incident involving Cursor AI, a programming assistant, which…

OpenAI : Detecting misbehavior in frontier reasoning models

Mar 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/chain-of-thought-monitoring Source: OpenAI Title: Detecting misbehavior in frontier reasoning models Feedly Summary: Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor their chains-of-thought. Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent. AI Summary and Description:…

The Register: Maybe cancel that ChatGPT therapy session – doesn’t respond well to tales of trauma

Mar 5, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/03/05/traumatic_content_chatgpt_anxious/ Source: The Register Title: Maybe cancel that ChatGPT therapy session – doesn’t respond well to tales of trauma Feedly Summary: Great, we’ve taken away computers’ ability to be accurate and given them anxiety If you think us meatbags are the only ones who get stressed and snappy when subjected to the horrors…

The Register: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o

Feb 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/02/27/llm_emergent_misalignment_study/ Source: The Register Title: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o Feedly Summary: Model was fine-tuned to write vulnerable software – then suggested enslaving humanity Computer scientists have found that fine-tuning notionally safe large language models to do one thing badly can negatively…

Simon Willison’s Weblog: Leaked Windsurf prompt

Feb 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/25/leaked-windsurf-prompt/ Source: Simon Willison’s Weblog Title: Leaked Windsurf prompt Feedly Summary: Leaked Windsurf prompt The Windurf Editor is Codeium’s highly regarded entrant into the fork-of-VS-code AI-enhanced IDE model first pioneered by Cursor (and by VS Code itself). I heard online that it had a quirky system prompt, and was able to replicate that…

Schneier on Security: More Research Showing AI Breaking the Rules

Feb 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.schneier.com/blog/archives/2025/02/more-research-showing-ai-breaking-the-rules.html Source: Schneier on Security Title: More Research Showing AI Breaking the Rules Feedly Summary: These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines…

Simon Willison’s Weblog: Grok 3 is highly vulnerable to indirect prompt injection

Feb 23, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/23/grok-3-indirect-prompt-injection/#atom-everything Source: Simon Willison’s Weblog Title: Grok 3 is highly vulnerable to indirect prompt injection Feedly Summary: Grok 3 is highly vulnerable to indirect prompt injection xAI’s new Grok 3 is so far exclusively deployed on Twitter (aka “X"), and apparently uses its ability to search for relevant tweets as part of every…

Tag: AI behavior