Tag: Qwen
-
Hacker News: Ladder: Self-Improving LLMs Through Recursive Problem Decomposition
Source URL: https://arxiv.org/abs/2503.00735 Source: Hacker News Title: Ladder: Self-Improving LLMs Through Recursive Problem Decomposition Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper introduces LADDER, a novel framework for enhancing the problem-solving capabilities of Large Language Models (LLMs) through a self-guided learning approach. By recursively generating simpler problem variants, LADDER enables models to…
-
Hacker News: Using GRPO to Beat o1, o3-mini and R1 at "Temporal Clue"
Source URL: https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue Source: Hacker News Title: Using GRPO to Beat o1, o3-mini and R1 at "Temporal Clue" Feedly Summary: Comments AI Summary and Description: Yes Short Summary with Insight: The provided text explores the application of reinforcement learning to enhance the deductive reasoning capabilities of smaller, open-weight models in AI. Specifically, it focuses on…
-
Hacker News: AMD Announces "Instella" Open-Source 3B Language Models
Source URL: https://www.phoronix.com/news/AMD-Intella-Open-Source-LM Source: Hacker News Title: AMD Announces "Instella" Open-Source 3B Language Models Feedly Summary: Comments AI Summary and Description: Yes Summary: AMD has announced the open-sourcing of its Instella language models, a significant advancement in the AI domain that promotes transparency, collaboration, and innovation. These models, based on the high-performance MI300X GPUs, aim…
-
Hacker News: QwQ-32B: Embracing the Power of Reinforcement Learning
Source URL: https://qwenlm.github.io/blog/qwq-32b/ Source: Hacker News Title: QwQ-32B: Embracing the Power of Reinforcement Learning Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the advancements in Reinforcement Learning (RL) as applied to large language models, particularly highlighting the launch of the QwQ-32B model. It emphasizes the model’s performance enhancements through RL and…
-
Schneier on Security: “Emergent Misalignment” in LLMs
Source URL: https://www.schneier.com/blog/archives/2025/02/emergent-misalignment-in-llms.html Source: Schneier on Security Title: “Emergent Misalignment” in LLMs Feedly Summary: Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model…
-
The Register: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o
Source URL: https://www.theregister.com/2025/02/27/llm_emergent_misalignment_study/ Source: The Register Title: Does terrible code drive you mad? Wait until you see what it does to OpenAI’s GPT-4o Feedly Summary: Model was fine-tuned to write vulnerable software – then suggested enslaving humanity Computer scientists have found that fine-tuning notionally safe large language models to do one thing badly can negatively…