Tag: duckdb
-
Tomasz Tunguz: The SQL Gap
Source URL: https://www.tomtunguz.com/spider-2-benchmark-trends/ Source: Tomasz Tunguz Title: The SQL Gap Feedly Summary: GPT-5 achieves 94.6% accuracy on AIME 2025, suggesting near-human mathematical reasoning. Yet ask it to query your database, and success rates plummet to the teens. The Spider 2.0 benchmarks reveal a yawning gap in AI capabilities. Spider 2.0 is a comprehensive text-to-SQL benchmark…
-
Simon Willison’s Weblog: S1: The $6 R1 Competitor?
Source URL: https://simonwillison.net/2025/Feb/5/s1-the-6-r1-competitor/ Source: Simon Willison’s Weblog Title: S1: The $6 R1 Competitor? Feedly Summary: S1: The $6 R1 Competitor? Tim Kellogg shares his notes on a new paper, s1: Simple test-time scaling, which describes an inference-scaling model fine-tuned on top of Qwen2.5-32B-Instruct for just $6 – the cost for 26 minutes on 16 NVIDIA…