Tag: evaluations

  • The Register: Brits end probe into Microsoft’s $13B bankrolling of OpenAI

    Source URL: https://www.theregister.com/2025/03/05/cma_microsoft_openai/ Source: The Register Title: Brits end probe into Microsoft’s $13B bankrolling of OpenAI Feedly Summary: Redmond doesn’t have total control over GPT maker so we lack authority, say monopoly cops The UK’s investigation into competition concerns arising from Microsoft’s $13 billion investment in OpenAI has reached a conclusion, albeit an anticlimactic one…

  • Google Online Security Blog: New AI-Powered Scam Detection Features to Help Protect You on Android

    Source URL: http://security.googleblog.com/2025/03/new-ai-powered-scam-detection-features.html Source: Google Online Security Blog Title: New AI-Powered Scam Detection Features to Help Protect You on Android Feedly Summary: AI Summary and Description: Yes Summary: The text discusses Google’s launch of AI-driven scam detection features for calls and text messages aimed at combating the rising sophistication of scams and fraud. With scammers…

  • Hacker News: GPT-4.5: "Not a frontier model"?

    Source URL: https://www.interconnects.ai/p/gpt-45-not-a-frontier-model Source: Hacker News Title: GPT-4.5: "Not a frontier model"? Feedly Summary: Comments AI Summary and Description: Yes Summary: The text highlights the release of OpenAI’s GPT-4.5 and analyzes its capabilities, implications, and performance compared to previous models. It discusses the model’s scale, pricing, and the evolving landscape of AI scaling, presenting insights…

  • Hacker News: 3x Improvement with Infinite Retrieval: Attention Enhanced LLMs in Long-Context

    Source URL: https://arxiv.org/abs/2502.12962 Source: Hacker News Title: 3x Improvement with Infinite Retrieval: Attention Enhanced LLMs in Long-Context Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a novel approach called InfiniRetri, which enhances long-context processing capabilities of Large Language Models (LLMs) by utilizing their own attention mechanisms for improved retrieval accuracy. This…

  • Simon Willison’s Weblog: Deep research System Card

    Source URL: https://simonwillison.net/2025/Feb/25/deep-research-system-card/#atom-everything Source: Simon Willison’s Weblog Title: Deep research System Card Feedly Summary: Deep research System Card OpenAI are rolling out their Deep research “agentic" research tool to their $20/month ChatGPT Plus users today, who get 10 queries a month. $200/month ChatGPT Pro gets 120 uses. Deep research is the best version of this…

  • OpenAI : Deep research System Card

    Source URL: https://openai.com/index/deep-research-system-card Source: OpenAI Title: Deep research System Card Feedly Summary: This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas. AI Summary and Description:…

  • Simon Willison’s Weblog: Leaked Windsurf prompt

    Source URL: https://simonwillison.net/2025/Feb/25/leaked-windsurf-prompt/ Source: Simon Willison’s Weblog Title: Leaked Windsurf prompt Feedly Summary: Leaked Windsurf prompt The Windurf Editor is Codeium’s highly regarded entrant into the fork-of-VS-code AI-enhanced IDE model first pioneered by Cursor (and by VS Code itself). I heard online that it had a quirky system prompt, and was able to replicate that…