Tag: tokens
-
Simon Willison’s Weblog: Trying out QvQ – Qwen’s new visual reasoning model
Source URL: https://simonwillison.net/2024/Dec/24/qvq/#atom-everything Source: Simon Willison’s Weblog Title: Trying out QvQ – Qwen’s new visual reasoning model Feedly Summary: I thought we were done for major model releases in 2024, but apparently not: Alibaba’s Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, “an experimental research model focusing on enhancing visual reasoning capabilities". Their blog…
-
Simon Willison’s Weblog: Finally, a Replacement for BERT: Introducing ModernBERT
Source URL: https://simonwillison.net/2024/Dec/24/modernbert/ Source: Simon Willison’s Weblog Title: Finally, a Replacement for BERT: Introducing ModernBERT Feedly Summary: Finally, a Replacement for BERT: Introducing ModernBERT BERT was an early language model released by Google in October 2018. Unlike modern LLMs it wasn’t designed for generating text. BERT was trained for masked token prediction and was generally…
-
Hacker News: Show HN: Llama 3.3 70B Sparse Autoencoders with API access
Source URL: https://www.goodfire.ai/papers/mapping-latent-spaces-llama/ Source: Hacker News Title: Show HN: Llama 3.3 70B Sparse Autoencoders with API access Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses innovative advancements made with the Llama 3.3 70B model, particularly the development and release of sparse autoencoders (SAEs) for interpretability and feature steering. These tools enhance…
-
Hacker News: Offline Reinforcement Learning for LLM Multi-Step Reasoning
Source URL: https://arxiv.org/abs/2412.16145 Source: Hacker News Title: Offline Reinforcement Learning for LLM Multi-Step Reasoning Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the development of a novel offline reinforcement learning method, OREO, aimed at improving the multi-step reasoning abilities of large language models (LLMs). This has significant implications in AI security…
-
Simon Willison’s Weblog: OpenAI O3 breakthrough high score on ARC-AGI-PUB
Source URL: https://simonwillison.net/2024/Dec/20/openai-o3-breakthrough/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI O3 breakthrough high score on ARC-AGI-PUB Feedly Summary: OpenAI O3 breakthrough high score on ARC-AGI-PUB François Chollet is the co-founder of the ARC Prize and had advanced access to today’s o3 results. His article here is the most insightful coverage I’ve seen of o3, going beyond…
-
Simon Willison’s Weblog: Gemini 2.0 Flash "Thinking mode"
Source URL: https://simonwillison.net/2024/Dec/19/gemini-thinking-mode/#atom-everything Source: Simon Willison’s Weblog Title: Gemini 2.0 Flash "Thinking mode" Feedly Summary: Those new model releases just keep on flowing. Today it’s Google’s snappily named gemini-2.0-flash-thinking-exp, their first entrant into the o1-style inference scaling class of models. I posted about a great essay about the significance of these just this morning. From…