Tag: model evaluation
-
Hacker News: Every Flop Counts: Scaling a 300B LLM Without Premium GPUs
Source URL: https://arxiv.org/abs/2503.05139 Source: Hacker News Title: Every Flop Counts: Scaling a 300B LLM Without Premium GPUs Feedly Summary: Comments AI Summary and Description: Yes Summary: This technical report presents advancements in training large-scale Mixture-of-Experts (MoE) language models, namely Ling-Lite and Ling-Plus, highlighting their efficiency and comparable performance to industry benchmarks while significantly reducing training…
-
Hacker News: Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview
Source URL: https://github.com/agentica-project/deepscaler Source: Hacker News Title: Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text describes the release of DeepScaleR, an open-source project aimed at democratizing reinforcement learning (RL) for large language models (LLMs). It highlights the project’s capabilities, training methodologies, and…
-
Hacker News: Bolt: Bootstrap Long Chain-of-Thought in LLMs Without Distillation [pdf]
Source URL: https://arxiv.org/abs/2502.03860 Source: Hacker News Title: Bolt: Bootstrap Long Chain-of-Thought in LLMs Without Distillation [pdf] Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper introduces BOLT, a method designed to enhance the reasoning capabilities of large language models (LLMs) by generating long chains of thought (LongCoT) without relying on knowledge distillation. The…
-
Hacker News: DeepSeek R1 Is Now Available on Azure AI Foundry and GitHub
Source URL: https://azure.microsoft.com/en-us/blog/deepseek-r1-is-now-available-on-azure-ai-foundry-and-github/ Source: Hacker News Title: DeepSeek R1 Is Now Available on Azure AI Foundry and GitHub Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the availability of DeepSeek R1 in the Azure AI Foundry model catalog, emphasizing the model’s integration into a trusted and scalable platform for businesses. It…