Tag: preserving
-
Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks
Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…
-
NCSC Feed: Preserving integrity in the age of generative AI
Source URL: https://www.ncsc.gov.uk/blog-post/preserving-integrity-in-age-generative-ai Source: NCSC Feed Title: Preserving integrity in the age of generative AI Feedly Summary: New ‘Content Credentials’ guidance from the NSA seeks to counter the erosion of trust. AI Summary and Description: Yes Summary: The text discusses the challenges posed by AI technologies in establishing trustworthiness of online content due to the…
-
Cloud Blog: Privacy-preserving Confidential Computing now on even more machines and services
Source URL: https://cloud.google.com/blog/products/identity-security/privacy-preserving-confidential-computing-now-on-even-more-machines/ Source: Cloud Blog Title: Privacy-preserving Confidential Computing now on even more machines and services Feedly Summary: Organizations are increasingly using Confidential Computing to help protect their sensitive data in use as part of their data protection efforts. Today, we are excited to highlight new Confidential Computing capabilities that make it easier for…
-
Hacker News: Some Lessons from the OpenAI FrontierMath Debacle
Source URL: https://www.lesswrong.com/posts/8ZgLYwBmB3vLavjKE/some-lessons-from-the-openai-frontiermath-debacle Source: Hacker News Title: Some Lessons from the OpenAI FrontierMath Debacle Feedly Summary: Comments AI Summary and Description: Yes Summary: OpenAI’s announcement of the o3 model showcased a remarkable achievement in reasoning and math, scoring 25% on the FrontierMath benchmark. However, subsequent implications regarding transparency and the potential misuse of exclusive access…
-
Hacker News: Alignment faking in large language models
Source URL: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models Source: Hacker News Title: Alignment faking in large language models Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses a new research paper by Anthropic and Redwood Research on the phenomenon of “alignment faking” in large language models, particularly focusing on the model Claude. It reveals that Claude can…