Tag: harmful content detection

  • Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

  • OpenAI : Upgrading the Moderation API with our new multimodal moderation model

    Source URL: https://openai.com/index/upgrading-the-moderation-api-with-our-new-multimodal-moderation-model Source: OpenAI Title: Upgrading the Moderation API with our new multimodal moderation model Feedly Summary: We’re introducing a new model built on GPT-4o that is more accurate at detecting harmful text and images, enabling developers to build more robust moderation systems. AI Summary and Description: Yes Summary: The introduction of a new…