reinforcement learning – Page 2 – Experimental News Clipping Site

Simon Willison’s Weblog: GLM-4.5: Reasoning, Coding, and Agentic Abililties

Jul 28, 2025

—

by

Source URL: https://simonwillison.net/2025/Jul/28/glm-45/#atom-everything Source: Simon Willison’s Weblog Title: GLM-4.5: Reasoning, Coding, and Agentic Abililties Feedly Summary: GLM-4.5: Reasoning, Coding, and Agentic Abililties Another day, another significant new open weight model release from a Chinese frontier AI lab. This time it’s Z.ai – who rebranded (at least in English) from Zhipu AI a few months ago.…

Simon Willison’s Weblog: Qwen3-Coder: Agentic Coding in the World

Jul 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/22/qwen3-coder/ Source: Simon Willison’s Weblog Title: Qwen3-Coder: Agentic Coding in the World Feedly Summary: Qwen3-Coder: Agentic Coding in the World It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger: Today, we’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder…

Cloud Blog: 25+ top gen AI how-to guides for enterprise

Jul 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/top-gen-ai-how-to-guides-for-enterprise/ Source: Cloud Blog Title: 25+ top gen AI how-to guides for enterprise Feedly Summary: The best way to learn AI is by building. From finding quick ways to deploy open models to building complex, multi-agentic systems, it’s easy to feel overwhelmed by the sheer volume of resources out there. To that end,…

Simon Willison’s Weblog: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Jul 21, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything Source: Simon Willison’s Weblog Title: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad Feedly Summary: Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad OpenAI beat them to the punch in terms of publicity by publishing their…

Simon Willison’s Weblog: OpenAI’s gold medal performance on the International Math Olympiad

Jul 19, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/19/openai-gold-medal-math-olympiad/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI’s gold medal performance on the International Math Olympiad Feedly Summary: OpenAI’s gold medal performance on the International Math Olympiad OpenAI research scientist Alexander Wei: I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance…

AWS News Blog: Announcing Amazon Nova customization in Amazon SageMaker AI

Jul 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://aws.amazon.com/blogs/aws/announcing-amazon-nova-customization-in-amazon-sagemaker-ai/ Source: AWS News Blog Title: Announcing Amazon Nova customization in Amazon SageMaker AI Feedly Summary: AWS now enables extensive customization of Amazon Nova foundation models through SageMaker AI with techniques including continued pre-training, supervised fine-tuning, direct preference optimization, reinforcement learning from human feedback and model distillation to better address domain-specific requirements across…

OpenAI : Addendum to o3 and o4-mini system card: Codex

May 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://openai.com/index/o3-o4-mini-codex-system-card-addendum Source: OpenAI Title: Addendum to o3 and o4-mini system card: Codex Feedly Summary: Codex is a cloud-based coding agent. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. codex-1 was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that…

Simon Willison’s Weblog: Expanding on what we missed with sycophancy

May 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/2/what-we-missed-with-sycophancy/ Source: Simon Willison’s Weblog Title: Expanding on what we missed with sycophancy Feedly Summary: Expanding on what we missed with sycophancy I criticized OpenAI’s initial post about their recent ChatGPT sycophancy rollback as being “relatively thin" so I’m delighted that they have followed it with a much more in-depth explanation of what…

Cloud Blog: Diving into the technology behind Google’s AI-era global network

Apr 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/networking/google-global-network-technology-deep-dive/ Source: Cloud Blog Title: Diving into the technology behind Google’s AI-era global network Feedly Summary: The unprecedented growth and unique challenges of AI applications are driving fundamental architectural changes to Google’s next-generation global network. The AI era brings an explosive surge in demand for network capacity, with novel traffic patterns characteristic of…

Simon Willison’s Weblog: Quoting Andriy Burkov

Apr 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/6/andriy-burkov/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Andriy Burkov Feedly Summary: […] The disappointing releases of both GPT-4.5 and Llama 4 have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits. Reinforcement learning is limited only to domains where a reward can…

Tag: reinforcement learning