Tag: model evaluation

  • The Register: Meta accused of Llama 4 bait-and-switch to juice AI benchmark rank

    Source URL: https://www.theregister.com/2025/04/08/meta_llama4_cheating/ Source: The Register Title: Meta accused of Llama 4 bait-and-switch to juice AI benchmark rank Feedly Summary: Did Facebook giant rizz up LLM to win over human voters? It appears so Meta submitted a specially crafted, non-public variant of its Llama 4 AI model to an online benchmark that may have unfairly…

  • Simon Willison’s Weblog: Quoting lmarena.ai

    Source URL: https://simonwillison.net/2025/Apr/8/lmaren/#atom-everything Source: Simon Willison’s Weblog Title: Quoting lmarena.ai Feedly Summary: We’ve seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we’re releasing 2,000+ head-to-head battle results for public review. […] In addition, we’re also adding the HF version of Llama-4-Maverick to Arena, with leaderboard results…

  • Hacker News: Every Flop Counts: Scaling a 300B LLM Without Premium GPUs

    Source URL: https://arxiv.org/abs/2503.05139 Source: Hacker News Title: Every Flop Counts: Scaling a 300B LLM Without Premium GPUs Feedly Summary: Comments AI Summary and Description: Yes Summary: This technical report presents advancements in training large-scale Mixture-of-Experts (MoE) language models, namely Ling-Lite and Ling-Plus, highlighting their efficiency and comparable performance to industry benchmarks while significantly reducing training…

  • AWS News Blog: DeepSeek-R1 now available as a fully managed serverless model in Amazon Bedrock

    Source URL: https://aws.amazon.com/blogs/aws/deepseek-r1-now-available-as-a-fully-managed-serverless-model-in-amazon-bedrock/ Source: AWS News Blog Title: DeepSeek-R1 now available as a fully managed serverless model in Amazon Bedrock Feedly Summary: DeepSeek-R1 is now available as a fully managed model in Amazon Bedrock, freeing up your teams to focus on strategic initiatives instead of managing infrastructure complexities. AI Summary and Description: Yes Summary: The…

  • Hacker News: Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview

    Source URL: https://github.com/agentica-project/deepscaler Source: Hacker News Title: Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text describes the release of DeepScaleR, an open-source project aimed at democratizing reinforcement learning (RL) for large language models (LLMs). It highlights the project’s capabilities, training methodologies, and…

  • Hacker News: Bolt: Bootstrap Long Chain-of-Thought in LLMs Without Distillation [pdf]

    Source URL: https://arxiv.org/abs/2502.03860 Source: Hacker News Title: Bolt: Bootstrap Long Chain-of-Thought in LLMs Without Distillation [pdf] Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper introduces BOLT, a method designed to enhance the reasoning capabilities of large language models (LLMs) by generating long chains of thought (LongCoT) without relying on knowledge distillation. The…

  • Hacker News: DeepSeek R1 Is Now Available on Azure AI Foundry and GitHub

    Source URL: https://azure.microsoft.com/en-us/blog/deepseek-r1-is-now-available-on-azure-ai-foundry-and-github/ Source: Hacker News Title: DeepSeek R1 Is Now Available on Azure AI Foundry and GitHub Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the availability of DeepSeek R1 in the Azure AI Foundry model catalog, emphasizing the model’s integration into a trusted and scalable platform for businesses. It…