Tag: model quantization

  • Simon Willison’s Weblog: Qwen3-4B Instruct and Thinking

    Source URL: https://simonwillison.net/2025/Aug/6/qwen3-4b-instruct-and-thinking/ Source: Simon Willison’s Weblog Title: Qwen3-4B Instruct and Thinking Feedly Summary: Qwen3-4B Instruct and Thinking Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential…

  • Simon Willison’s Weblog: Introducing Gemma 3n: The developer guide

    Source URL: https://simonwillison.net/2025/Jun/26/gemma-3n/ Source: Simon Willison’s Weblog Title: Introducing Gemma 3n: The developer guide Feedly Summary: Introducing Gemma 3n: The developer guide Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered with a focus…

  • Hacker News: Fast LLM Inference From Scratch (using CUDA)

    Source URL: https://andrewkchan.dev/posts/yalm.html Source: Hacker News Title: Fast LLM Inference From Scratch (using CUDA) Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text provides a comprehensive overview of implementing a low-level LLM (Large Language Model) inference engine using C++ and CUDA. It details various optimization techniques to enhance inference performance on both CPU…

  • Hacker News: Tencent drops a 389B MoE model(Open-source and free for commercial use))

    Source URL: https://github.com/Tencent/Tencent-Hunyuan-Large Source: Hacker News Title: Tencent drops a 389B MoE model(Open-source and free for commercial use)) Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text introduces the Hunyuan-Large model, the largest open-source Transformer-based Mixture of Experts (MoE) model, developed by Tencent, which boasts 389 billion parameters, optimizing performance while managing resource…

  • Hacker News: An embarrassingly simple approach to recover unlearned knowledge for LLMs

    Source URL: https://arxiv.org/abs/2410.16454 Source: Hacker News Title: An embarrassingly simple approach to recover unlearned knowledge for LLMs Feedly Summary: Comments AI Summary and Description: Yes Summary: The text focuses on the challenge of “unlearning” in large language models (LLMs), specifically addressing the effectiveness of current unlearning methods in truly erasing unwanted knowledge. It highlights a…

  • Hacker News: Show HN: Client Side anti-RAG solution

    Source URL: https://ai.unturf.com/#client-side Source: Hacker News Title: Show HN: Client Side anti-RAG solution Feedly Summary: Comments AI Summary and Description: Yes Summary: The text describes the deployment and usage of the Hermes AI model, highlighting an open-source AI service that facilitates user interaction via Python and Node.js examples. The mention of open-source principles, infrastructure setup,…

  • Hacker News: Lm.rs Minimal CPU LLM inference in Rust with no dependency

    Source URL: https://github.com/samuel-vitorino/lm.rs Source: Hacker News Title: Lm.rs Minimal CPU LLM inference in Rust with no dependency Feedly Summary: Comments AI Summary and Description: Yes Summary: The provided text pertains to the development and utilization of a Rust-based application for running inference on Large Language Models (LLMs), particularly the LLama 3.2 models. It discusses technical…