benchmark – Page 13 – Experimental News Clipping Site

Simon Willison’s Weblog: OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth

Jun 9, 2025

—

by

Source URL: https://simonwillison.net/2025/Jun/9/openai-revenue/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth Feedly Summary: OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth Noteworthy because OpenAI revenue is a useful indicator of the direction of the generative AI industry in general, and frequently comes…

Slashdot: Apple Researchers Challenge AI Reasoning Claims With Controlled Puzzle Tests

Jun 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://apple.slashdot.org/story/25/06/09/1151210/apple-researchers-challenge-ai-reasoning-claims-with-controlled-puzzle-tests?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Apple Researchers Challenge AI Reasoning Claims With Controlled Puzzle Tests Feedly Summary: AI Summary and Description: Yes Summary: Apple researchers have discovered that advanced reasoning AI models, including OpenAI’s o3-mini and Gemini, exhibit a performance collapse at higher complexity levels in puzzle-solving tasks. This finding challenges existing assumptions about…

Cloud Blog: Unlock 66% better price-performance with new M4 VMs for memory-intensive workloads

Jun 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/m4-vms-are-designed-for-memory-intensive-workloads-like-sap/ Source: Cloud Blog Title: Unlock 66% better price-performance with new M4 VMs for memory-intensive workloads Feedly Summary: Today, we’re excited to announce the general availability of the memory-optimized machine series: Compute Engine M4, our most performant memory-optimized VM with under 6TB of memory. The M4 family is designed for workloads like SAP…

Simon Willison’s Weblog: Qwen3 Embedding

Jun 8, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/8/qwen3-embedding/#atom-everything Source: Simon Willison’s Weblog Title: Qwen3 Embedding Feedly Summary: Qwen3 Embedding New family of embedding models from Qwen, in three sizes: 0.6B, 4B, 8B – and two categories: Text Embedding and Text Reranking. The full collection can be browsed on Hugging Face. The smallest available model is the 0.6B Q8 one, which…

Simon Willison’s Weblog: The last year six months in LLMs, illustrated by pelicans on bicycles

Jun 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/6/six-months-in-llms/#atom-everything Source: Simon Willison’s Weblog Title: The last year six months in LLMs, illustrated by pelicans on bicycles Feedly Summary: I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event – here’s my talks from October 2023 and…

Cloud Blog: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes

Jun 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploying-llama4-and-deepseek-on-ai-hypercomputer/ Source: Cloud Blog Title: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes Feedly Summary: The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and…

Cloud Blog: How Alpian is redefining private banking for the digital age with gen AI

Jun 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/financial-services/how-alpian-is-redefining-private-banking-for-the-digital-age-with-gen-ai/ Source: Cloud Blog Title: How Alpian is redefining private banking for the digital age with gen AI Feedly Summary: As the first fully cloud-native private bank in Switzerland, Alpian stands at the forefront of digital innovation in the financial services sector. With its unique model blending personal wealth management and digital convenience,…

Simon Willison’s Weblog: Shisa V2 405B: Japan’s Highest Performing LLM

Jun 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/3/shisa-v2/ Source: Simon Willison’s Weblog Title: Shisa V2 405B: Japan’s Highest Performing LLM Feedly Summary: Shisa V2 405B: Japan’s Highest Performing LLM Leonard Lin and Adam Lensenmayer have been working on Shisa for a while. They describe their latest release as “Japan’s Highest Performing LLM". Shisa V2 405B is the highest-performing LLM ever…

The Register: IBM Watson zombie brand shuffles forward with new AI lab in NYC

Jun 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/06/02/ibm_acquires_seek_ai/ Source: The Register Title: IBM Watson zombie brand shuffles forward with new AI lab in NYC Feedly Summary: Unsurprisingly, it’s all about agents, the buzzword du jour IBM on Monday unveiled watsonx AI Labs, a New York City hub where startups, researchers, and IBM engineers are expected to co-create agentic AI tools…

Simon Willison’s Weblog: How often do LLMs snitch? Recreating Theo’s SnitchBench with LLM

May 31, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/May/31/snitchbench-with-llm/#atom-everything Source: Simon Willison’s Weblog Title: How often do LLMs snitch? Recreating Theo’s SnitchBench with LLM Feedly Summary: A fun new benchmark just dropped! Inspired by the Claude 4 system card – which showed that Claude 4 might just rat you out to the authorities if you told it to “take initiative" in…

Tag: benchmark