quantized – Page 2 – Experimental News Clipping Site

Simon Willison’s Weblog: GLM-4.5: Reasoning, Coding, and Agentic Abililties

Jul 28, 2025

—

by

Source URL: https://simonwillison.net/2025/Jul/28/glm-45/#atom-everything Source: Simon Willison’s Weblog Title: GLM-4.5: Reasoning, Coding, and Agentic Abililties Feedly Summary: GLM-4.5: Reasoning, Coding, and Agentic Abililties Another day, another significant new open weight model release from a Chinese frontier AI lab. This time it’s Z.ai – who rebranded (at least in English) from Zhipu AI a few months ago.…

Docker: Tool Calling with Local LLMs: A Practical Evaluation

Jun 30, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/local-llm-tool-calling-a-practical-evaluation/ Source: Docker Title: Tool Calling with Local LLMs: A Practical Evaluation Feedly Summary: Which local model should I use for tool calling? When building GenAI and agentic applications, one of the most pressing and persistent questions is: “Which local model should I use for tool calling?” We kept hearing again and again,…

Simon Willison’s Weblog: Quoting Ted Sanders

Jun 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/11/ted-sanders/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Ted Sanders Feedly Summary: [on the cheaper o3] Not quantized. Weights are the same. If we did change the model, we’d release it as a new model with a new name in the API (e.g., o3-turbo-2025-06-10). It would be very annoying to API customers if we…

Cloud Blog: Building a Production Multimodal Fine-Tuning Pipeline

Jun 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/building-a-production-multimodal-fine-tuning-pipeline/ Source: Cloud Blog Title: Building a Production Multimodal Fine-Tuning Pipeline Feedly Summary: Looking to fine-tune multimodal AI models for your specific domain but facing infrastructure and implementation challenges? This guide demonstrates how to overcome the multimodal implementation gap using Google Cloud and Axolotl, with a complete hands-on example fine-tuning Gemma 3 on…

Simon Willison’s Weblog: Trying out llama.cpp’s new vision support

May 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/10/llama-cpp-vision/#atom-everything Source: Simon Willison’s Weblog Title: Trying out llama.cpp’s new vision support Feedly Summary: This llama.cpp server vision support via libmtmd pull request – via Hacker News – was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented on this page, but the…

Simon Willison’s Weblog: Qwen3-8B

May 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/2/qwen3-8b/#atom-everything Source: Simon Willison’s Weblog Title: Qwen3-8B Feedly Summary: Having tried a few of the Qwen 3 models now my favorite is a bit of a surprise to me: I’m really enjoying Qwen3-8B. I’ve been running prompts through the MLX 4bit quantized version, mlx-community/Qwen3-8B-4bit. I’m using llm-mlx like this: llm install llm-mlx llm…

Slashdot: DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio

Mar 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://apple.slashdot.org/story/25/03/25/2054214/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the launch of DeepSeek’s new large language model, DeepSeek-V3-0324, highlighting its unique deployment strategy and implications for the AI industry. Its compatibility with consumer-grade hardware and open-source…

Simon Willison’s Weblog: Mistral Small 3.1

Mar 17, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/17/mistral-small-31/#atom-everything Source: Simon Willison’s Weblog Title: Mistral Small 3.1 Feedly Summary: Mistral Small 3.1 Mistral Small 3 came out in January and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it’s multi-modal (images) and has an increased 128,000 token context length,…

Bulletins: Vulnerability Summary for the Week of March 10, 2025

Mar 17, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.cisa.gov/news-events/bulletins/sb25-076 Source: Bulletins Title: Vulnerability Summary for the Week of March 10, 2025 Feedly Summary: High Vulnerabilities PrimaryVendor — Product Description Published CVSS Score Source Info 1E–1E Client Improper link resolution before file access in the Nomad module of the 1E Client, in versions prior to 25.3, enables an attacker with local unprivileged…

The Register: DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba’s QwQ

Mar 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/03/16/qwq_hands_on_review/ Source: The Register Title: DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba’s QwQ Feedly Summary: How to tame its hypersensitive hyperparameters and get it running on your PC Hands on How much can reinforcement learning – and a bit of extra verification – improve large language models,…

Tag: quantized