Tag: Spider
-
Tomasz Tunguz: The SQL Gap
Source URL: https://www.tomtunguz.com/spider-2-benchmark-trends/ Source: Tomasz Tunguz Title: The SQL Gap Feedly Summary: GPT-5 achieves 94.6% accuracy on AIME 2025, suggesting near-human mathematical reasoning. Yet ask it to query your database, and success rates plummet to the teens. The Spider 2.0 benchmarks reveal a yawning gap in AI capabilities. Spider 2.0 is a comprehensive text-to-SQL benchmark…