Tag: safety specifications

  • Simon Willison’s Weblog: Trading Inference-Time Compute for Adversarial Robustness

    Source URL: https://simonwillison.net/2025/Jan/22/trading-inference-time-compute/ Source: Simon Willison’s Weblog Title: Trading Inference-Time Compute for Adversarial Robustness Feedly Summary: Trading Inference-Time Compute for Adversarial Robustness Brand new research paper from OpenAI, exploring how inference-scaling “reasoning" models such as o1 might impact the search for improved security with respect to things like prompt injection. We conduct experiments on the…

  • OpenAI : Deliberative alignment: reasoning enables safer language models

    Source URL: https://openai.com/index/deliberative-alignment Source: OpenAI Title: Deliberative alignment: reasoning enables safer language models Feedly Summary: Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them. AI Summary and Description: Yes Summary: The text discusses a new alignment strategy…