Experimental News Clipping Site

Tag: data contamination

The Register: Search-capable AI agents may cheat on benchmark tests

Aug 23, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/08/23/searchcapable_ai_agents_may_cheat/ Source: The Register Title: Search-capable AI agents may cheat on benchmark tests Feedly Summary: Data contamination can make models seem more capable than they really are Researchers with Scale AI have found that search-based AI models may cheat on benchmark tests by fetching the answers directly from online sources rather than deriving…
Hacker News: Can We Trust AI Benchmarks? A Review of Current Issues in AI Evaluation

Feb 15, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://arxiv.org/abs/2502.06559 Source: Hacker News Title: Can We Trust AI Benchmarks? A Review of Current Issues in AI Evaluation Feedly Summary: Comments AI Summary and Description: Yes Summary: This paper critically examines the current practices of AI benchmarking, which are crucial for evaluating AI model performance, safety, and compliance. It highlights significant shortcomings in…