math problem – Experimental News Clipping Site

Simon Willison’s Weblog: Deep Think in the Gemini app

Aug 1, 2025

—

by

Source URL: https://simonwillison.net/2025/Aug/1/deep-think-in-the-gemini-app/ Source: Simon Willison’s Weblog Title: Deep Think in the Gemini app Feedly Summary: Deep Think in the Gemini app Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved the gold-medal standard at this year’s International Mathematical Olympiad…

Simon Willison’s Weblog: OpenAI’s gold medal performance on the International Math Olympiad

Jul 19, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/19/openai-gold-medal-math-olympiad/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI’s gold medal performance on the International Math Olympiad Feedly Summary: OpenAI’s gold medal performance on the International Math Olympiad OpenAI research scientist Alexander Wei: I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance…

Slashdot: Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find

Jul 4, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://tech.slashdot.org/story/25/07/04/1521245/simple-text-additions-can-fool-advanced-ai-reasoning-models-researchers-find Source: Slashdot Title: Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find Feedly Summary: AI Summary and Description: Yes Summary: The research highlights a significant vulnerability in state-of-the-art reasoning AI models through the “CatAttack” technique, which attaches irrelevant phrases to math problems, leading to higher error rates and inefficient responses.…

Simon Willison’s Weblog: Saying "hi" to Microsoft’s Phi-4-reasoning

May 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/6/phi-4-reasoning/#atom-everything Source: Simon Willison’s Weblog Title: Saying "hi" to Microsoft’s Phi-4-reasoning Feedly Summary: Microsoft released a new sub-family of models a few days ago: Phi-4 reasoning. They introduced them in this blog post celebrating a year since the release of Phi-3: Today, we are excited to introduce Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning – marking…

Simon Willison’s Weblog: What’s new in the world of LLMs, for NICAR 2025

Mar 8, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/8/nicar-llms/ Source: Simon Willison’s Weblog Title: What’s new in the world of LLMs, for NICAR 2025 Feedly Summary: I presented two sessions at the NICAR 2025 data journalism conference this year. The first was this one based on my review of LLMs in 2024, extended by several months to cover everything that’s happened…

The Register: AI revoir, Lucie: France’s answer to ChatGPT paused after faux pas overdrive

Jan 29, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/01/29/french_ai_chatbot_lucie_suspended/ Source: The Register Title: AI revoir, Lucie: France’s answer to ChatGPT paused after faux pas overdrive Feedly Summary: Slew of embarrassing answers sends open source chatterbox back for more schooling As China demonstrates how competitive open source AI models can be via the latest DeepSeek release, France has shown the opposite.… AI…

Hacker News: Some Lessons from the OpenAI FrontierMath Debacle

Jan 21, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.lesswrong.com/posts/8ZgLYwBmB3vLavjKE/some-lessons-from-the-openai-frontiermath-debacle Source: Hacker News Title: Some Lessons from the OpenAI FrontierMath Debacle Feedly Summary: Comments AI Summary and Description: Yes Summary: OpenAI’s announcement of the o3 model showcased a remarkable achievement in reasoning and math, scoring 25% on the FrontierMath benchmark. However, subsequent implications regarding transparency and the potential misuse of exclusive access…

Tag: math problem

Simon Willison’s Weblog: Deep Think in the Gemini app

Simon Willison’s Weblog: OpenAI’s gold medal performance on the International Math Olympiad

Slashdot: Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find

Simon Willison’s Weblog: Saying "hi" to Microsoft’s Phi-4-reasoning

Simon Willison’s Weblog: What’s new in the world of LLMs, for NICAR 2025

The Register: AI revoir, Lucie: France’s answer to ChatGPT paused after faux pas overdrive

Hacker News: Some Lessons from the OpenAI FrontierMath Debacle