7 Sonnet – Experimental News Clipping Site

Simon Willison’s Weblog: The last year six months in LLMs, illustrated by pelicans on bicycles

Jun 6, 2025

—

by

Source URL: https://simonwillison.net/2025/Jun/6/six-months-in-llms/#atom-everything Source: Simon Willison’s Weblog Title: The last year six months in LLMs, illustrated by pelicans on bicycles Feedly Summary: I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event – here’s my talks from October 2023 and…

Simon Willison’s Weblog: System Card: Claude Opus 4 & Claude Sonnet 4

May 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/25/claude-4-system-card/#atom-everything Source: Simon Willison’s Weblog Title: System Card: Claude Opus 4 & Claude Sonnet 4 Feedly Summary: System Card: Claude Opus 4 & Claude Sonnet 4 Direct link to a PDF on Anthropic’s CDN because they don’t appear to have a landing page anywhere for this document. Anthropic’s system cards are always worth…

Simon Willison’s Weblog: Updated Anthropic model comparison table

May 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/22/updated-anthropic-models/#atom-everything Source: Simon Willison’s Weblog Title: Updated Anthropic model comparison table Feedly Summary: Updated Anthropic model comparison table A few details in here about Claude 4 that I hadn’t spotted elsewhere: The training cut-off date for Claude Opus 4 and Claude Sonnet 4 is March 2025! That’s the most recent cut-off for any…

Slashdot: Asking Chatbots For Short Answers Can Increase Hallucinations, Study Finds

May 13, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/05/12/2114214/asking-chatbots-for-short-answers-can-increase-hallucinations-study-finds?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Asking Chatbots For Short Answers Can Increase Hallucinations, Study Finds Feedly Summary: AI Summary and Description: Yes Summary: The research from Giskard highlights a critical concern for AI professionals regarding the trade-off between response length and factual accuracy among leading AI models. This finding is particularly relevant for those…

Simon Willison’s Weblog: Medium is the new large

May 7, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/7/medium-is-the-new-large/#atom-everything Source: Simon Willison’s Weblog Title: Medium is the new large Feedly Summary: Medium is the new large New model release from Mistral – this time closed source/proprietary. Mistral Medium claims strong benchmark scores similar to GPT-4o and Claude 3.7 Sonnet, but is priced at $0.40/million input and $2/million output – about the…

Simon Willison’s Weblog: Understanding the recent criticism of the Chatbot Arena

Apr 30, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/30/criticism-of-the-chatbot-arena/#atom-everything Source: Simon Willison’s Weblog Title: Understanding the recent criticism of the Chatbot Arena Feedly Summary: The Chatbot Arena has become the go-to place for vibes-based evaluation of LLMs over the past two years. The project, originating at UC Berkeley, is home to a large community of model enthusiasts who submit prompts to…

Simon Willison’s Weblog: Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining

Apr 26, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/26/o3-photo-locations/ Source: Simon Willison’s Weblog Title: Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining Feedly Summary: Watching OpenAI’s new o3 model guess where a photo was taken is one of those moments where decades of science fiction suddenly come to life. It’s a cross between the Enhance Button and…

Slashdot: OpenAI Unveils Coding-Focused GPT-4.1 While Phasing Out GPT-4.5

Apr 14, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/04/14/1726250/openai-unveils-coding-focused-gpt-41-while-phasing-out-gpt-45 Source: Slashdot Title: OpenAI Unveils Coding-Focused GPT-4.1 While Phasing Out GPT-4.5 Feedly Summary: AI Summary and Description: Yes Summary: OpenAI’s launch of the GPT-4.1 model family emphasizes enhanced coding capabilities and instruction adherence. The new models expand token context significantly and introduce a tiered pricing strategy, offering a more cost-effective alternative while…

Slashdot: AI Models Still Struggle To Debug Software, Microsoft Study Shows

Apr 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://developers.slashdot.org/story/25/04/11/0519242/ai-models-still-struggle-to-debug-software-microsoft-study-shows Source: Slashdot Title: AI Models Still Struggle To Debug Software, Microsoft Study Shows Feedly Summary: AI Summary and Description: Yes Summary: The study by Microsoft Research highlights the limitations of popular AI models, such as Anthropic’s Claude 3.7 Sonnet and OpenAI’s o3-mini, in successfully debugging software. Despite advancements, AI still falls short…

Slashdot: Anthropic Launches Its Own $200 Monthly Plan

Apr 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://slashdot.org/story/25/04/09/203231/anthropic-launches-its-own-200-monthly-plan Source: Slashdot Title: Anthropic Launches Its Own $200 Monthly Plan Feedly Summary: AI Summary and Description: Yes Summary: Anthropic is introducing a premium tier for its AI chatbot Claude, designed for heavy users, which includes various subscription options that enhance usage limits substantially. This move signifies increasing competition in the AI chatbot…

Tag: 7 Sonnet