Hacker News – Page 3 – Experimental News Clipping Site

Simon Willison’s Weblog: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Jul 21, 2025

—

by

Source URL: https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything Source: Simon Willison’s Weblog Title: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad Feedly Summary: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad OpenAI beat them to the punch in terms of publicity by publishing their…

Simon Willison’s Weblog: Coding with LLMs in the summer of 2025 (an update)

Jul 21, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/21/coding-with-llms/#atom-everything Source: Simon Willison’s Weblog Title: Coding with LLMs in the summer of 2025 (an update) Feedly Summary: Coding with LLMs in the summer of 2025 (an update) Salvatore Sanfilippo describes his current AI-assisted development workflow. He’s all-in on LLMs for code review, exploratory prototyping, pair-design and writing “part of the code under…

Simon Willison’s Weblog: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Jul 12, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/12/ai-open-source-productivity/#atom-everything Source: Simon Willison’s Weblog Title: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity Feedly Summary: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity METR – for Model Evaluation & Threat Research – are a non-profit research institute founded by Beth Barnes, a former alignment researcher at…

Simon Willison’s Weblog: moonshotai/Kimi-K2-Instruct

Jul 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-everything Source: Simon Willison’s Weblog Title: moonshotai/Kimi-K2-Instruct Feedly Summary: moonshotai/Kimi-K2-Instruct Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon. My HuggingFace storage calculator says the repository is 958.52 GB. It’s a…

Simon Willison’s Weblog: I Shipped a macOS App Built Entirely by Claude Code

Jul 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/6/macos-app-built-entirely-by-claude-code/#atom-everything Source: Simon Willison’s Weblog Title: I Shipped a macOS App Built Entirely by Claude Code Feedly Summary: I Shipped a macOS App Built Entirely by Claude Code Indragie Karunaratne has “been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a…

Simon Willison’s Weblog: Frequently Asked Questions (And Answers) About AI Evals

Jul 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/3/faqs-about-ai-evals/#atom-everything Source: Simon Willison’s Weblog Title: Frequently Asked Questions (And Answers) About AI Evals Feedly Summary: Frequently Asked Questions (And Answers) About AI Evals Hamel Husain and Shreya Shankar have been running a paid, cohort-based course on AI Evals For Engineers & PMs over the past few months. Here Hamel collects answers to…

Simon Willison’s Weblog: AbsenceBench: Language Models Can’t Tell What’s Missing

Jun 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…

Simon Willison’s Weblog: How OpenElections Uses LLMs

Jun 19, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-everything Source: Simon Willison’s Weblog Title: How OpenElections Uses LLMs Feedly Summary: How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in…

Simon Willison’s Weblog: Quoting Workaccount2 on Hacker News

Jun 18, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/18/context-rot/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Workaccount2 on Hacker News Feedly Summary: They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot…

Simon Willison’s Weblog: OpenAI slams court order to save all ChatGPT logs, including deleted chats

Jun 5, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/5/openai-court-order/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI slams court order to save all ChatGPT logs, including deleted chats Feedly Summary: OpenAI slams court order to save all ChatGPT logs, including deleted chats This is very worrying. The New York Times v OpenAI lawsuit, now in its 17th month, includes accusations that OpenAI’s models…

Tag: Hacker News