Simon Willison’s Weblog: Weeknotes: Starting 2025 a little slow

Jan 5, 2025

—

Source URL: https://simonwillison.net/2025/Jan/4/weeknotes/#atom-everything
Source: Simon Willison’s Weblog
Title: Weeknotes: Starting 2025 a little slow

Feedly Summary: I published my review of 2024 in LLMs and then got into a fight with most of the internet over the phone microphone targeted ads conspiracy theory.
In my last weeknotes I talked about how December in LLMs has been a lot. That was on December 20th, and it turned out there were at least three big new LLM stories still to come before the end of the year:

OpenAI announced initial benchmarks for their o3 reasoning model, which I covered in a live blog for the last day of their mixed-quality 12 days of OpenAI series. o3 is genuinely impressive.
Alibaba’s Qwen released their QvQ visual reasoning model, which I ran locally using mlx-vlm. It’s the o1/o3 style trick applied to image prompting and it runs on my laptop.
DeepSeek – the other big open license Chinese AI lab – shocked everyone by releasing DeepSeek v3 on Christmas day, an open model that compares favorably to the very best closed model and was trained for just $5.6m, 11x less that Meta’s best Llama 3 model, Llama 3.1 405B.

For the second year running I published my review of LLM developments over the past year on December 31st. I’d estimate this took at least four hours of computer time to write and another two of miscellaneous note taking over the past few weeks, but that’s likely an under-estimate.
It went over really well. I’ve had a ton of great feedback about it, both from people who wanted to catch up and from people who have been following the space closely. I even got fireballed!
I’ve had a slower start to 2025 than I had intended. A challenge with writing online is that, like code, writing requires maintenance: any time I drop a popular article I feel obliged to track and participate in any resulting conversations.
Then just as the chatter about my 2024 review started to fade, the Apple Siri microphone settlement story broke and I couldn’t resist publishing I still don’t think companies serve you ads based on spying through your microphone.
Trying to talk people out of believing that conspiracy theory is my toxic trait. I know there’s no point even trying, but I can’t drag myself away.
I think my New Year’s resolution should probably be to spend less time arguing with people on the internet!
Anyway: January is here, and I’m determined to use it to make progress on both Datasette 1.0 and the paid launch of Datasette Cloud.
Blog entries

I still don’t think companies serve you ads based on spying through your microphone
Ending a year long posting streak
Things we learned about LLMs in 2024
Trying out QvQ – Qwen’s new visual reasoning model
My approach to running a link blog
Live blog: the 12th day of OpenAI – “Early evals for OpenAI o3"

TILs

Calculating the size of all LFS files in a repo – 2024-12-25

Named Entity Resolution with dslim/distilbert-NER – 2024-12-24

Tags: ai, datasette, weeknotes, openai, generative-ai, llms, qwen, deepseek

AI Summary and Description: Yes

Summary: The text provides an overview of significant developments in Large Language Models (LLMs) and visual reasoning models, detailing the progress made by various AI labs in 2024. It also touches on the ongoing debates around privacy and targeted advertising, highlighting the importance of consistent discourse in the tech community.

Detailed Description:
The text encapsulates several key developments and events related to LLMs and AI technologies that unfolded towards the end of 2024. Here are the notable points:

– **OpenAI’s o3 Reasoning Model**: OpenAI revealed initial benchmarks for their innovative o3 reasoning model, which demonstrates advanced capabilities in reasoning. This announcement reflects the continuous advancements in LLM technology.

– **Alibaba’s Qwen and QvQ Visual Reasoning Model**: Alibaba has rolled out the QvQ model, which applies techniques akin to previous models but focuses on visual reasoning. The author successfully tested this model locally, indicating its accessibility and potential for broader use.

– **DeepSeek v3 Release**: DeepSeek, a notable Chinese AI lab, launched DeepSeek v3 on Christmas Day, drawing attention due to its competitive performance against leading models while being significantly more cost-effective in terms of training.

– **Community Engagement**: The author’s retrospective on LLM advancements received positive feedback, underscoring the value of sharing insights and engaging in dialogue within the AI community.

– **Ongoing Conversations**: The text also addresses the author’s personal reflections regarding online debates, particularly around privacy concerns linked to targeted advertising practices, which highlights prevalent misconceptions in society.

– **Future Aspirations**: Looking ahead, the author expresses a commitment to advancing projects like Datasette 1.0 and the launch of Datasette Cloud in 2025, indicating ongoing innovation and development in the AI field.

**Significance**:
– The developments described are symptomatic of a rapidly evolving AI landscape, relevant for professionals in AI, cloud computing, and infrastructure security. Understanding these advancements can help security professionals anticipate emerging risks and compliance challenges brought about by new technologies.

– The discourse on microphone surveillance and targeted advertising reflects broader concerns around privacy and data security, emphasizing the need for vigilance and ethical standards in the deployment and regulation of AI technologies.

**Bullet Points**:
– OpenAI’s introduction of the o3 reasoning model with promising benchmarks.
– Alibaba’s QvQ visual reasoning model and user experience.
– The cost-effective training of DeepSeek v3 compared to other models.
– Positive community feedback on the author’s LLM developments review.
– Ongoing debates about privacy, targeted ads, and public perception of AI’s implications.
– Future developmental plans for Datasette and its cloud services.

This text serves as both a chronicle of AI advancements and a reflection on the critical conversations surrounding the use of AI technologies in society.

.NET 1 2 2024 3 4 5 a access accessibility Act advanced capabilities advancement advancements advertising AGI AI AI landscape AI technologies Alibaba anti API Apple art as based benchmark benchmarks bert by C capabilities challenges chat closed Cloud cloud computing cloud service cloud services code community community engagement companies compliance compliance challenges compute computer Computing concept concerns cost cost-effective critical D data data security dataset datasette day de DeepSeek Deepseek v3 demo deployment development DistilBERT dsl e effective emerging risks end engagement ethical ethical standards evals event exp feedback for full future g Gen generative Go gs high Highlight http HTTPS image implications in infrastructure infrastructure security innovation insights inter intern internet ite Just k l language language model language models large large language model large language models led Link linked llama llm llms lm long low Meta Micro misconceptions mixed ML mlx model models my no o o1 o3 of on one open openai ory over perception performance piracy post pre privacy privacy concerns professionals Progress projects prompt public public perception publishing Py Qwen R rack rag RCE real reasoning reasoning model reasoning models Regulation resolution Risk risks s sec security security professionals self service services settlement SHA sharing Sig Sim Siri SoC society source spy spying SSE standards surveillance T targeted advertising tech tech community techniques technologies technology test text the Time to Tor TP training trie two up US user user experience uth V3 val vigilance visual reasoning WAN web weeknotes Well Wi x