Tag: model capabilities

  • Slashdot: ChatGPT Models Are Surprisingly Good At Geoguessing

    Source URL: https://yro.slashdot.org/story/25/04/17/1941258/chatgpt-models-are-surprisingly-good-at-geoguessing?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: ChatGPT Models Are Surprisingly Good At Geoguessing Feedly Summary: AI Summary and Description: Yes Summary: The text discusses a concerning trend related to the use of OpenAI’s new models, o3 and o4-mini, for deducing locations from images, raising potential privacy issues. The models’ advanced image analysis capabilities combined with…

  • Simon Willison’s Weblog: Quoting James Betker

    Source URL: https://simonwillison.net/2025/Apr/16/james-betker/#atom-everything Source: Simon Willison’s Weblog Title: Quoting James Betker Feedly Summary: I work for OpenAI. […] o4-mini is actually a considerably better vision model than o3, despite the benchmarks. Similar to how o3-mini-high was a much better coding model than o1. I would recommend using o4-mini-high over o3 for any task involving vision.…

  • Simon Willison’s Weblog: GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet

    Source URL: https://simonwillison.net/2025/Apr/14/gpt-4-1/ Source: Simon Willison’s Weblog Title: GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet Feedly Summary: OpenAI introduced three new models this morning: GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. These are API-only models right now, not available through the ChatGPT interface (though you can try them out…

  • Simon Willison’s Weblog: Quoting Andriy Burkov

    Source URL: https://simonwillison.net/2025/Apr/6/andriy-burkov/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Andriy Burkov Feedly Summary: […] The disappointing releases of both GPT-4.5 and Llama 4 have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits. Reinforcement learning is limited only to domains where a reward can…

  • Docker: Run LLMs Locally with Docker: A Quickstart Guide to Model Runner

    Source URL: https://www.docker.com/blog/run-llms-locally/ Source: Docker Title: Run LLMs Locally with Docker: A Quickstart Guide to Model Runner Feedly Summary: AI is quickly becoming a core part of modern applications, but running large language models (LLMs) locally can still be a pain. Between picking the right model, navigating hardware quirks, and optimizing for performance, it’s easy…

  • Simon Willison’s Weblog: Gemini 2.5 Pro Preview pricing

    Source URL: https://simonwillison.net/2025/Apr/4/gemini-25-pro-pricing/ Source: Simon Willison’s Weblog Title: Gemini 2.5 Pro Preview pricing Feedly Summary: Gemini 2.5 Pro Preview pricing Google’s Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new…

  • Hacker News: Tao: Using test-time compute to train efficient LLMs without labeled data

    Source URL: https://www.databricks.com/blog/tao-using-test-time-compute-train-efficient-llms-without-labeled-data Source: Hacker News Title: Tao: Using test-time compute to train efficient LLMs without labeled data Feedly Summary: Comments AI Summary and Description: Yes Summary: The text introduces a new model tuning method for large language models (LLMs) called Test-time Adaptive Optimization (TAO) that enhances model quality without requiring large amounts of labeled…

  • Simon Willison’s Weblog: Putting Gemini 2.5 Pro through its paces

    Source URL: https://simonwillison.net/2025/Mar/25/gemini/ Source: Simon Willison’s Weblog Title: Putting Gemini 2.5 Pro through its paces Feedly Summary: There’s a new release from Google Gemini this morning: the first in the Gemini 2.5 series. Google call it “a thinking model, designed to tackle increasingly complex problems". It’s already sat at the top of the LM Arena…

  • Simon Willison’s Weblog: Mistral Small 3.1

    Source URL: https://simonwillison.net/2025/Mar/17/mistral-small-31/#atom-everything Source: Simon Willison’s Weblog Title: Mistral Small 3.1 Feedly Summary: Mistral Small 3.1 Mistral Small 3 came out in January and was a notable, genuinely excellent local model that used an Apache 2.0 license. Mistral Small 3.1 offers a significant improvement: it’s multi-modal (images) and has an increased 128,000 token context length,…

  • Simon Willison’s Weblog: Quoting Ai2

    Source URL: https://simonwillison.net/2025/Mar/13/ai2/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Ai2 Feedly Summary: Today we release OLMo 2 32B, the most capable and largest model in the OLMo 2 family, scaling up the OLMo 2 training recipe used for our 7B and 13B models released in November. It is trained up to 6T tokens and post-trained…