Tag: Hacker News
-
Simon Willison’s Weblog: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
Source URL: https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything Source: Simon Willison’s Weblog Title: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad Feedly Summary: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad OpenAI beat them to the punch in terms of publicity by publishing their…
-
Simon Willison’s Weblog: Coding with LLMs in the summer of 2025 (an update)
Source URL: https://simonwillison.net/2025/Jul/21/coding-with-llms/#atom-everything Source: Simon Willison’s Weblog Title: Coding with LLMs in the summer of 2025 (an update) Feedly Summary: Coding with LLMs in the summer of 2025 (an update) Salvatore Sanfilippo describes his current AI-assisted development workflow. He’s all-in on LLMs for code review, exploratory prototyping, pair-design and writing “part of the code under…
-
Simon Willison’s Weblog: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Source URL: https://simonwillison.net/2025/Jul/12/ai-open-source-productivity/#atom-everything Source: Simon Willison’s Weblog Title: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity Feedly Summary: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity METR – for Model Evaluation & Threat Research – are a non-profit research institute founded by Beth Barnes, a former alignment researcher at…
-
Simon Willison’s Weblog: moonshotai/Kimi-K2-Instruct
Source URL: https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-everything Source: Simon Willison’s Weblog Title: moonshotai/Kimi-K2-Instruct Feedly Summary: moonshotai/Kimi-K2-Instruct Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon. My HuggingFace storage calculator says the repository is 958.52 GB. It’s a…
-
Simon Willison’s Weblog: I Shipped a macOS App Built Entirely by Claude Code
Source URL: https://simonwillison.net/2025/Jul/6/macos-app-built-entirely-by-claude-code/#atom-everything Source: Simon Willison’s Weblog Title: I Shipped a macOS App Built Entirely by Claude Code Feedly Summary: I Shipped a macOS App Built Entirely by Claude Code Indragie Karunaratne has “been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a…
-
Simon Willison’s Weblog: Frequently Asked Questions (And Answers) About AI Evals
Source URL: https://simonwillison.net/2025/Jul/3/faqs-about-ai-evals/#atom-everything Source: Simon Willison’s Weblog Title: Frequently Asked Questions (And Answers) About AI Evals Feedly Summary: Frequently Asked Questions (And Answers) About AI Evals Hamel Husain and Shreya Shankar have been running a paid, cohort-based course on AI Evals For Engineers & PMs over the past few months. Here Hamel collects answers to…
-
Simon Willison’s Weblog: AbsenceBench: Language Models Can’t Tell What’s Missing
Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…
-
Simon Willison’s Weblog: How OpenElections Uses LLMs
Source URL: https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-everything Source: Simon Willison’s Weblog Title: How OpenElections Uses LLMs Feedly Summary: How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in…
-
Simon Willison’s Weblog: Quoting Workaccount2 on Hacker News
Source URL: https://simonwillison.net/2025/Jun/18/context-rot/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Workaccount2 on Hacker News Feedly Summary: They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot…
-
Simon Willison’s Weblog: OpenAI slams court order to save all ChatGPT logs, including deleted chats
Source URL: https://simonwillison.net/2025/Jun/5/openai-court-order/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI slams court order to save all ChatGPT logs, including deleted chats Feedly Summary: OpenAI slams court order to save all ChatGPT logs, including deleted chats This is very worrying. The New York Times v OpenAI lawsuit, now in its 17th month, includes accusations that OpenAI’s models…