Tag: quantized

  • Docker: Tool Calling with Local LLMs: A Practical Evaluation

    Source URL: https://www.docker.com/blog/local-llm-tool-calling-a-practical-evaluation/ Source: Docker Title: Tool Calling with Local LLMs: A Practical Evaluation Feedly Summary: Which local model should I use for tool calling? When building GenAI and agentic applications, one of the most pressing and persistent questions is: “Which local model should I use for tool calling?”  We kept hearing again and again,…

  • Simon Willison’s Weblog: Quoting Ted Sanders

    Source URL: https://simonwillison.net/2025/Jun/11/ted-sanders/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Ted Sanders Feedly Summary: [on the cheaper o3] Not quantized. Weights are the same. If we did change the model, we’d release it as a new model with a new name in the API (e.g., o3-turbo-2025-06-10). It would be very annoying to API customers if we…

  • Simon Willison’s Weblog: Trying out llama.cpp’s new vision support

    Source URL: https://simonwillison.net/2025/May/10/llama-cpp-vision/#atom-everything Source: Simon Willison’s Weblog Title: Trying out llama.cpp’s new vision support Feedly Summary: This llama.cpp server vision support via libmtmd pull request – via Hacker News – was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented on this page, but the…

  • Simon Willison’s Weblog: Qwen3-8B

    Source URL: https://simonwillison.net/2025/May/2/qwen3-8b/#atom-everything Source: Simon Willison’s Weblog Title: Qwen3-8B Feedly Summary: Having tried a few of the Qwen 3 models now my favorite is a bit of a surprise to me: I’m really enjoying Qwen3-8B. I’ve been running prompts through the MLX 4bit quantized version, mlx-community/Qwen3-8B-4bit. I’m using llm-mlx like this: llm install llm-mlx llm…

  • Slashdot: DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio

    Source URL: https://apple.slashdot.org/story/25/03/25/2054214/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the launch of DeepSeek’s new large language model, DeepSeek-V3-0324, highlighting its unique deployment strategy and implications for the AI industry. Its compatibility with consumer-grade hardware and open-source…

  • Simon Willison’s Weblog: Mistral Small 3.1

    Source URL: https://simonwillison.net/2025/Mar/17/mistral-small-31/#atom-everything Source: Simon Willison’s Weblog Title: Mistral Small 3.1 Feedly Summary: Mistral Small 3.1 Mistral Small 3 came out in January and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it’s multi-modal (images) and has an increased 128,000 token context length,…

  • The Register: DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba’s QwQ

    Source URL: https://www.theregister.com/2025/03/16/qwq_hands_on_review/ Source: The Register Title: DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba’s QwQ Feedly Summary: How to tame its hypersensitive hyperparameters and get it running on your PC Hands on How much can reinforcement learning – and a bit of extra verification – improve large language models,…