Tag: 7 Sonnet

  • Simon Willison’s Weblog: Gemini 2.5 Pro Preview pricing

    Source URL: https://simonwillison.net/2025/Apr/4/gemini-25-pro-pricing/ Source: Simon Willison’s Weblog Title: Gemini 2.5 Pro Preview pricing Feedly Summary: Gemini 2.5 Pro Preview pricing Google’s Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new…

  • Simon Willison’s Weblog: debug-gym

    Source URL: https://simonwillison.net/2025/Mar/31/debug-gym/#atom-everything Source: Simon Willison’s Weblog Title: debug-gym Feedly Summary: debug-gym New paper and code from Microsoft Research that experiments with giving LLMs access to the Python debugger. They found that the best models could indeed improve their results by running pdb as a tool. They saw the best results overall from Claude 3.7…

  • Wired: Amazon’s AGI Lab Reveals Its First Work: Advanced AI Agents

    Source URL: https://www.wired.com/story/amazon-ai-agents-nova-web-browsing/ Source: Wired Title: Amazon’s AGI Lab Reveals Its First Work: Advanced AI Agents Feedly Summary: Led by a former OpenAI executive, Amazon’s AI lab focuses on the decision-making capabilities of next generation of software agents—and borrows insights from physical robots. AI Summary and Description: Yes Summary: Amazon is making strides in artificial…

  • Hacker News: Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison

    Source URL: https://composio.dev/blog/gemini-2-5-pro-vs-claude-3-7-sonnet-coding-comparison/ Source: Hacker News Title: Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the recent launch of Google’s Gemini 2.5 Pro, highlighting its superiority over Claude 3.7 Sonnet in coding capabilities. It emphasizes the advantages of Gemini 2.5 Pro, including…

  • Simon Willison’s Weblog: Claude can now search the web

    Source URL: https://simonwillison.net/2025/Mar/20/claude-can-now-search-the-web/#atom-everything Source: Simon Willison’s Weblog Title: Claude can now search the web Feedly Summary: Claude can now search the web Claude 3.7 Sonnet on the paid plan now has a web search tool that can be turned on as a global setting. This was sorely needed. ChatGPT, Gemini and Grok all had this…

  • Hacker News: Why Anthropic’s Claude still hasn’t beaten Pokémon

    Source URL: https://arstechnica.com/ai/2025/03/why-anthropics-claude-still-hasnt-beaten-pokemon/ Source: Hacker News Title: Why Anthropic’s Claude still hasn’t beaten Pokémon Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the advancements in artificial intelligence, particularly focusing on the evolving capabilities of models like Anthropic’s Claude, which are on the trajectory towards achieving artificial general intelligence (AGI). The potential…

  • Simon Willison’s Weblog: Putting Gemini 2.5 Pro through its paces

    Source URL: https://simonwillison.net/2025/Mar/25/gemini/ Source: Simon Willison’s Weblog Title: Putting Gemini 2.5 Pro through its paces Feedly Summary: There’s a new release from Google Gemini this morning: the first in the Gemini 2.5 series. Google call it “a thinking model, designed to tackle increasingly complex problems". It’s already sat at the top of the LM Arena…

  • Slashdot: Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains

    Source URL: https://tech.slashdot.org/story/25/03/25/195227/google-unveils-gemini-25-pro-its-latest-ai-reasoning-model-with-significant-benchmark-gains?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains Feedly Summary: AI Summary and Description: Yes Summary: Google DeepMind has launched Gemini 2.5, an advanced AI model notable for its improved reasoning capabilities and coding abilities. This model’s performance exceeds many competitors, highlighting its…

  • Simon Willison’s Weblog: The "think" tool: Enabling Claude to stop and think in complex tool use situations

    Source URL: https://simonwillison.net/2025/Mar/21/the-think-tool/#atom-everything Source: Simon Willison’s Weblog Title: The "think" tool: Enabling Claude to stop and think in complex tool use situations Feedly Summary: The “think" tool: Enabling Claude to stop and think in complex tool use situations Fascinating new prompt engineering trick from Anthropic. They use their standard tool calling mechanism to define a…

  • Simon Willison’s Weblog: Claude can now search the web

    Source URL: https://simonwillison.net/2025/Mar/20/claude-can-now-search-the-web/#atom-everything Source: Simon Willison’s Weblog Title: Claude can now search the web Feedly Summary: Claude can now search the web Claude 3.7 Sonnet on the paid plan now has a web search tool that can be turned on as a global setting. This was sorely needed. ChatGPT, Gemini and Grok all had this…