Tag: Anthropic

  • Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

  • Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

    Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…

  • Simon Willison’s Weblog: OpenAI reasoning models: Advice on prompting

    Source URL: https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/ Source: Simon Willison’s Weblog Title: OpenAI reasoning models: Advice on prompting Feedly Summary: OpenAI reasoning models: Advice on prompting OpenAI’s documentation for their o1 and o3 “reasoning models" includes some interesting tips on how to best prompt them: Developer messages are the new system messages: Starting with o1-2024-12-17, reasoning models support developer…

  • Simon Willison’s Weblog: llm-anthropic

    Source URL: https://simonwillison.net/2025/Feb/2/llm-anthropic/#atom-everything Source: Simon Willison’s Weblog Title: llm-anthropic Feedly Summary: llm-anthropic I’ve renamed my llm-claude-3 plugin to llm-anthropic, on the basis that Claude 4 will probably happen at some point so this is a better name for the plugin. If you’re a previous user of llm-claude-3 you can upgrade to the new plugin like…

  • Slashdot: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts

    Source URL: https://slashdot.org/story/25/02/02/0319217/openai-tests-its-ais-persuasiveness-by-comparing-it-to-reddit-posts?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts Feedly Summary: AI Summary and Description: Yes Summary: OpenAI utilized the subreddit r/ChangeMyView to test and evaluate the persuasive capabilities of its AI reasoning models, particularly through a structured process that involves comparing AI-generated responses with human replies.…

  • Hacker News: Anthropic’s CEO says DeepSeek shows US export rules are working

    Source URL: https://techcrunch.com/2025/01/29/anthropics-ceo-says-deepseek-shows-that-u-s-export-rules-are-working-as-intended/ Source: Hacker News Title: Anthropic’s CEO says DeepSeek shows US export rules are working Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the implications of export controls on AI chips in relation to the performance of the Chinese company DeepSeek compared to U.S. AI firms, particularly Anthropic. Dario…

  • Simon Willison’s Weblog: On DeepSeek and Export Controls

    Source URL: https://simonwillison.net/2025/Jan/29/on-deepseek-and-export-controls/ Source: Simon Willison’s Weblog Title: On DeepSeek and Export Controls Feedly Summary: On DeepSeek and Export Controls Anthropic CEO (and previously GPT-2/GPT-3 development lead at OpenAI) Dario Amodei’s essay about DeepSeek includes a lot of interesting background on the last few years of AI development. Dario was one of the authors on…

  • Hacker News: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

    Source URL: https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/ Source: Hacker News Title: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the creation of a new malware named Nepenthes, designed by a software developer to combat AI web crawlers that ignore “no scraping” directives…

  • Slashdot: Anthropic Builds RAG Directly Into Claude Models With New Citations API

    Source URL: https://slashdot.org/story/25/01/27/2129250/anthropic-builds-rag-directly-into-claude-models-with-new-citations-api Source: Slashdot Title: Anthropic Builds RAG Directly Into Claude Models With New Citations API Feedly Summary: AI Summary and Description: Yes Summary: Anthropic has introduced a new feature called Citations for its Claude models, enhancing their ability to provide accurate and traceable responses by linking answers directly to source documents. This development…

  • The Register: DeepSeek’s R1 curiously tells El Reg reader: ‘My guidelines are set by OpenAI’

    Source URL: https://www.theregister.com/2025/01/27/deepseek_r1_identity/ Source: The Register Title: DeepSeek’s R1 curiously tells El Reg reader: ‘My guidelines are set by OpenAI’ Feedly Summary: Despite impressive benchmarks, the Chinese-made LLM is not without some interesting issues DeepSeek’s open source reasoning-capable R1 LLM family boasts impressive benchmark scores – but its erratic responses raise more questions about how…