smaller models – Experimental News Clipping Site

Simon Willison’s Weblog: Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines

Oct 4, 2025

—

by

Source URL: https://simonwillison.net/2025/Oct/4/drew-on-dspy/#atom-everything Source: Simon Willison’s Weblog Title: Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines Feedly Summary: Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines I’ve had trouble getting my head around DSPy in the past. This half hour talk by Drew…

Docker: Fine-Tuning Local Models with Docker Offload and Unsloth

Oct 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/fine-tuning-models-with-offload-and-unsloth/ Source: Docker Title: Fine-Tuning Local Models with Docker Offload and Unsloth Feedly Summary: I’ve been experimenting with local models for a while now, and the progress in making them accessible has been exciting. Initial experiences are often fantastic, many models, like Gemma 3 270M, are lightweight enough to run on common hardware.…

Cloud Blog: Gemini and OSS text embeddings are now in BigQuery ML

Sep 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/data-analytics/use-gemini-and-open-source-text-embedding-models-in-bigquery/ Source: Cloud Blog Title: Gemini and OSS text embeddings are now in BigQuery ML Feedly Summary: High-quality text embeddings are the engine for modern AI applications like semantic search, classification, and retrieval-augmented generation (RAG). But when it comes to picking a model to generate these embeddings, we know one size doesn’t fit…

Simon Willison’s Weblog: gpt-5 and gpt-5-mini rate limit updates

Sep 12, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Sep/12/gpt-5-rate-limits/#atom-everything Source: Simon Willison’s Weblog Title: gpt-5 and gpt-5-mini rate limit updates Feedly Summary: gpt-5 and gpt-5-mini rate limit updates OpenAI have increased the rate limits for their two main GPT-5 models. These look significant: gpt-5 Tier 1: 30K → 500K TPM (1.5M batch) Tier 2: 450K → 1M (3M batch) Tier 3:…

Simon Willison’s Weblog: Introducing EmbeddingGemma

Sep 4, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Sep/4/embedding-gemma/#atom-everything Source: Simon Willison’s Weblog Title: Introducing EmbeddingGemma Feedly Summary: Introducing EmbeddingGemma Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google: Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is small enough to run on less than 200MB of RAM with…

The Register: Little LLM on the RAM: Google’s Gemma 270M hits the scene

Aug 15, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/08/15/little_llm_on_the_ram/ Source: The Register Title: Little LLM on the RAM: Google’s Gemma 270M hits the scene Feedly Summary: A tiny model trained on trillions of tokens, ready for specialized tasks Google has unveiled a pint-sized new addition to its “open" large language model lineup: Gemma 3 270M.… AI Summary and Description: Yes Summary:…

Simon Willison’s Weblog: Qwen3-4B-Thinking: "This is art – pelicans don’t ride bikes!"

Aug 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Aug/10/qwen3-4b/#atom-everything Source: Simon Willison’s Weblog Title: Qwen3-4B-Thinking: "This is art – pelicans don’t ride bikes!" Feedly Summary: I’ve fallen a few days behind keeping up with Qwen. They released two new 4B models last week: Qwen3-4B-Instruct-2507 and its thinking equivalent Qwen3-4B-Thinking-2507. These are relatively tiny models that punch way above their weight. I’ve…

The Register: How OpenAI used a new data type to cut inference costs by 75%

Aug 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/08/10/openai_mxfp4/ Source: The Register Title: How OpenAI used a new data type to cut inference costs by 75% Feedly Summary: Decision to use MXFP4 makes models smaller, faster, and more importantly, cheaper for everyone involved Analysis Whether or not OpenAI’s new open weights models are any good is still up for debate, but…

Docker: Remocal and Minimum Viable Models: Why Right-Sized Models Beat API Overkill

Aug 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/remocal-minimum-viable-models-ai/ Source: Docker Title: Remocal and Minimum Viable Models: Why Right-Sized Models Beat API Overkill Feedly Summary: A practical approach to escaping the expensive, slow world of API-dependent AI The $20K Monthly Reality Check You built a simple sentiment analyzer for customer reviews. It works great. Except it costs $847/month in API calls…

Simon Willison’s Weblog: Qwen3-4B Instruct and Thinking

Aug 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Aug/6/qwen3-4b-instruct-and-thinking/ Source: Simon Willison’s Weblog Title: Qwen3-4B Instruct and Thinking Feedly Summary: Qwen3-4B Instruct and Thinking Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential…

Tag: smaller models