Tag: ollama
-
Simon Willison’s Weblog: Saying "hi" to Microsoft’s Phi-4-reasoning
Source URL: https://simonwillison.net/2025/May/6/phi-4-reasoning/#atom-everything Source: Simon Willison’s Weblog Title: Saying "hi" to Microsoft’s Phi-4-reasoning Feedly Summary: Microsoft released a new sub-family of models a few days ago: Phi-4 reasoning. They introduced them in this blog post celebrating a year since the release of Phi-3: Today, we are excited to introduce Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning – marking…
-
Simon Willison’s Weblog: Qwen 3 offers a case study in how to effectively release a model
Source URL: https://simonwillison.net/2025/Apr/29/qwen-3/ Source: Simon Willison’s Weblog Title: Qwen 3 offers a case study in how to effectively release a model Feedly Summary: Alibaba’s Qwen team released the hotly anticipated Qwen 3 model family today. The Qwen models are already some of the best open weight models – Apache 2.0 licensed and with a variety…
-
The Register: <em>El Reg’s</em> essential guide to deploying LLMs in production
Source URL: https://www.theregister.com/2025/04/22/llm_production_guide/ Source: The Register Title: <em>El Reg’s</em> essential guide to deploying LLMs in production Feedly Summary: Running GenAI models is easy. Scaling them to thousands of users, not so much Hands On You can spin up a chatbot with Llama.cpp or Ollama in minutes, but scaling large language models to handle real workloads…
-
Simon Willison’s Weblog: Gemma 3 QAT Models
Source URL: https://simonwillison.net/2025/Apr/19/gemma-3-qat-models/ Source: Simon Willison’s Weblog Title: Gemma 3 QAT Models Feedly Summary: Gemma 3 QAT Models Interesting release from Google, as a follow-up to Gemma 3 from last month: To make Gemma 3 even more accessible, we are announcing new versions optimized with Quantization-Aware Training (QAT) that dramatically reduces memory requirements while maintaining…
-
Simon Willison’s Weblog: An LLM Query Understanding Service
Source URL: https://simonwillison.net/2025/Apr/9/an-llm-query-understanding-service/#atom-everything Source: Simon Willison’s Weblog Title: An LLM Query Understanding Service Feedly Summary: An LLM Query Understanding Service Doug Turnbull recently wrote about how all search is structured now: Many times, even a small open source LLM will be able to turn a search query into reasonable structure at relatively low cost. In…
-
Cloud Blog: Introducing Firebase Studio and agentic developer tools to build with Gemini
Source URL: https://cloud.google.com/blog/products/application-development/firebase-studio-lets-you-build-full-stack-ai-apps-with-gemini/ Source: Cloud Blog Title: Introducing Firebase Studio and agentic developer tools to build with Gemini Feedly Summary: Millions of developers use Firebase to engage their users, powering over 70 billion instances of apps every day, everywhere — from mobile devices and web browsers, to embedded platforms and agentic experiences. But full-stack development…
-
Simon Willison’s Weblog: Mistral Small 3.1 on Ollama
Source URL: https://simonwillison.net/2025/Apr/8/mistral-small-31-on-ollama/#atom-everything Source: Simon Willison’s Weblog Title: Mistral Small 3.1 on Ollama Feedly Summary: Mistral Small 3.1 on Ollama Mistral Small 3.1 (previously) is now available through Ollama, providing an easy way to run this multi-modal (vision) model on a Mac (and other platforms, though I haven’t tried them myself yet). I had to…
-
Simon Willison’s Weblog: Function calling with Gemma
Source URL: https://simonwillison.net/2025/Mar/26/function-calling-with-gemma/#atom-everything Source: Simon Willison’s Weblog Title: Function calling with Gemma Feedly Summary: Function calling with Gemma Google’s Gemma 3 model (the 27B variant is particularly capable, I’ve been trying it out via Ollama) supports function calling exclusively through prompt engineering. The official documentation describes two recommended prompts – both of them suggest that…
-
Simon Willison’s Weblog: Mistral Small 3.1
Source URL: https://simonwillison.net/2025/Mar/17/mistral-small-31/#atom-everything Source: Simon Willison’s Weblog Title: Mistral Small 3.1 Feedly Summary: Mistral Small 3.1 Mistral Small 3 came out in January and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it’s multi-modal (images) and has an increased 128,000 token context length,…
-
Simon Willison’s Weblog: Notes on Google’s Gemma 3
Source URL: https://simonwillison.net/2025/Mar/12/gemma-3/ Source: Simon Willison’s Weblog Title: Notes on Google’s Gemma 3 Feedly Summary: Google’s Gemma team released an impressive new model today (under their not-open-source Gemma license). Gemma 3 comes in four sizes – 1B, 4B, 12B, and 27B – and while 1B is text-only the larger three models are all multi-modal for…