Anthropic – Page 33 – Experimental News Clipping Site

Simon Willison’s Weblog: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Source URL: https://simonwillison.net/2025/Feb/3/constitutional-classifiers/ Source: Simon Willison’s Weblog Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Constitutional Classifiers: Defending against universal jailbreaks Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming. From the paper: In particular, we introduce Constitutional Classifiers, a framework…

Hacker News: Constitutional Classifiers: Defending against universal jailbreaks

Feb 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.anthropic.com/research/constitutional-classifiers Source: Hacker News Title: Constitutional Classifiers: Defending against universal jailbreaks Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach by the Anthropic Safeguards Research Team to defend AI models against jailbreaks through the use of Constitutional Classifiers. This system demonstrates robustness against various jailbreak techniques while…

Simon Willison’s Weblog: OpenAI reasoning models: Advice on prompting

Feb 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/ Source: Simon Willison’s Weblog Title: OpenAI reasoning models: Advice on prompting Feedly Summary: OpenAI reasoning models: Advice on prompting OpenAI’s documentation for their o1 and o3 “reasoning models" includes some interesting tips on how to best prompt them: Developer messages are the new system messages: Starting with o1-2024-12-17, reasoning models support developer…

Simon Willison’s Weblog: llm-anthropic

Feb 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/2/llm-anthropic/#atom-everything Source: Simon Willison’s Weblog Title: llm-anthropic Feedly Summary: llm-anthropic I’ve renamed my llm-claude-3 plugin to llm-anthropic, on the basis that Claude 4 will probably happen at some point so this is a better name for the plugin. If you’re a previous user of llm-claude-3 you can upgrade to the new plugin like…

Slashdot: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts

Feb 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/02/02/0319217/openai-tests-its-ais-persuasiveness-by-comparing-it-to-reddit-posts?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: OpenAI Tests Its AI’s Persuasiveness By Comparing It to Reddit Posts Feedly Summary: AI Summary and Description: Yes Summary: OpenAI utilized the subreddit r/ChangeMyView to test and evaluate the persuasive capabilities of its AI reasoning models, particularly through a structured process that involves comparing AI-generated responses with human replies.…

Hacker News: Anthropic’s CEO says DeepSeek shows US export rules are working

Jan 30, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://techcrunch.com/2025/01/29/anthropics-ceo-says-deepseek-shows-that-u-s-export-rules-are-working-as-intended/ Source: Hacker News Title: Anthropic’s CEO says DeepSeek shows US export rules are working Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the implications of export controls on AI chips in relation to the performance of the Chinese company DeepSeek compared to U.S. AI firms, particularly Anthropic. Dario…

Simon Willison’s Weblog: On DeepSeek and Export Controls

Jan 29, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jan/29/on-deepseek-and-export-controls/ Source: Simon Willison’s Weblog Title: On DeepSeek and Export Controls Feedly Summary: On DeepSeek and Export Controls Anthropic CEO (and previously GPT-2/GPT-3 development lead at OpenAI) Dario Amodei’s essay about DeepSeek includes a lot of interesting background on the last few years of AI development. Dario was one of the authors on…

Hacker News: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Jan 28, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/ Source: Hacker News Title: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the creation of a new malware named Nepenthes, designed by a software developer to combat AI web crawlers that ignore “no scraping” directives…

Slashdot: Anthropic Builds RAG Directly Into Claude Models With New Citations API

Jan 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/01/27/2129250/anthropic-builds-rag-directly-into-claude-models-with-new-citations-api Source: Slashdot Title: Anthropic Builds RAG Directly Into Claude Models With New Citations API Feedly Summary: AI Summary and Description: Yes Summary: Anthropic has introduced a new feature called Citations for its Claude models, enhancing their ability to provide accurate and traceable responses by linking answers directly to source documents. This development…

The Register: DeepSeek’s R1 curiously tells El Reg reader: ‘My guidelines are set by OpenAI’

Jan 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/01/27/deepseek_r1_identity/ Source: The Register Title: DeepSeek’s R1 curiously tells El Reg reader: ‘My guidelines are set by OpenAI’ Feedly Summary: Despite impressive benchmarks, the Chinese-made LLM is not without some interesting issues DeepSeek’s open source reasoning-capable R1 LLM family boasts impressive benchmark scores – but its erratic responses raise more questions about how…

Tag: Anthropic