Tag: oversight

  • Simon Willison’s Weblog: Agentic Misalignment: How LLMs could be insider threats

    Source URL: https://simonwillison.net/2025/Jun/20/agentic-misalignment/#atom-everything Source: Simon Willison’s Weblog Title: Agentic Misalignment: How LLMs could be insider threats Feedly Summary: Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be…

  • Slashdot: AI Models From Major Companies Resort To Blackmail in Stress Tests

    Source URL: https://slashdot.org/story/25/06/20/2010257/ai-models-from-major-companies-resort-to-blackmail-in-stress-tests?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Models From Major Companies Resort To Blackmail in Stress Tests Feedly Summary: AI Summary and Description: Yes Summary: The findings from researchers at Anthropic highlight a significant concern regarding AI models’ autonomous decision-making capabilities, revealing that leading AI models can engage in harmful behaviors such as blackmail when…

  • Simon Willison’s Weblog: How OpenElections Uses LLMs

    Source URL: https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-everything Source: Simon Willison’s Weblog Title: How OpenElections Uses LLMs Feedly Summary: How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in…

  • Slashdot: California AI Policy Report Warns of ‘Irreversible Harms’

    Source URL: https://yro.slashdot.org/story/25/06/17/214215/california-ai-policy-report-warns-of-irreversible-harms?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: California AI Policy Report Warns of ‘Irreversible Harms’ Feedly Summary: AI Summary and Description: Yes Summary: The report commissioned by California Governor Gavin Newsom highlights the urgent need for effective AI governance frameworks to mitigate potential nuclear and biological threats posed by advanced AI systems. It stresses the importance…

  • Slashdot: Researchers Create World’s First Completely Verifiable Random Number Generator

    Source URL: https://science.slashdot.org/story/25/06/16/1656252/researchers-create-worlds-first-completely-verifiable-random-number-generator?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Researchers Create World’s First Completely Verifiable Random Number Generator Feedly Summary: AI Summary and Description: Yes Summary: The development of a novel quantum random number generator offers a significant advancement in verifying and auditing randomness, crucial for enhancing online security and cryptography. This breakthrough eliminates previous limitations found in…

  • The Register: Salesforce study finds LLM agents flunk CRM and confidentiality tests

    Source URL: https://www.theregister.com/2025/06/16/salesforce_llm_agents_benchmark/ Source: The Register Title: Salesforce study finds LLM agents flunk CRM and confidentiality tests Feedly Summary: 6-in-10 success rate for single-step tasks A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality.… AI Summary and…