Experimental News Clipping Site

Tag: safety specifications

Simon Willison’s Weblog: Trading Inference-Time Compute for Adversarial Robustness

Jan 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jan/22/trading-inference-time-compute/ Source: Simon Willison’s Weblog Title: Trading Inference-Time Compute for Adversarial Robustness Feedly Summary: Trading Inference-Time Compute for Adversarial Robustness Brand new research paper from OpenAI, exploring how inference-scaling “reasoning" models such as o1 might impact the search for improved security with respect to things like prompt injection. We conduct experiments on the…
OpenAI : Deliberative alignment: reasoning enables safer language models

Jan 8, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/deliberative-alignment Source: OpenAI Title: Deliberative alignment: reasoning enables safer language models Feedly Summary: Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them. AI Summary and Description: Yes Summary: The text discusses a new alignment strategy…