Tag: safety

  • Simon Willison’s Weblog: GPT-5 has a hidden system prompt

    Source URL: https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/#atom-everything Source: Simon Willison’s Weblog Title: GPT-5 has a hidden system prompt Feedly Summary: GPT-5 has a hidden system prompt It looks like GPT-5 when accessed via the OpenAI API may have its own hidden system prompt, independent from the system prompt you can specify in an API call. At the very least…

  • Slashdot: Co-Founder of xAI Departs the Company

    Source URL: https://slashdot.org/story/25/08/14/0414234/co-founder-of-xai-departs-the-company?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Co-Founder of xAI Departs the Company Feedly Summary: AI Summary and Description: Yes Summary: Igor Babuschkin, co-founder of xAI, is departing to launch Babuschkin Ventures, a VC firm aimed at supporting AI safety and startups that promote human advancement. His experience includes significant roles at both xAI and leading…

  • The Register: Stock in the Channel pulls website amid cyberattack

    Source URL: https://www.theregister.com/2025/08/14/stock_in_the_channel_pulls/ Source: The Register Title: Stock in the Channel pulls website amid cyberattack Feedly Summary: Intruders accessed important systems but tells customers their data is safe A UK-based multinational that provides tech stock availability tools is telling customers that its website outage is due to a cyber attack.… AI Summary and Description: Yes…

  • Wired: OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

    Source URL: https://www.wired.com/story/openai-gpt5-safety/ Source: Wired Title: OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs Feedly Summary: The new version of ChatGPT explains why it won’t generate rule-breaking outputs. WIRED’s initial analysis found that some guardrails were easy to circumvent. AI Summary and Description: Yes Summary: The text discusses a new version of…

  • Microsoft Security Blog: Dow’s 125-year legacy: Innovating with AI to secure a long future

    Source URL: https://www.microsoft.com/en-us/security/blog/2025/08/12/dows-125-year-legacy-innovating-with-ai-to-secure-a-long-future/ Source: Microsoft Security Blog Title: Dow’s 125-year legacy: Innovating with AI to secure a long future Feedly Summary: Microsoft recently spoke with Mario Ferket, Chief Information Security Officer for Dow, about the company’s approach to AI in security. The post Dow’s 125-year legacy: Innovating with AI to secure a long future appeared…

  • The Register: VS Code previews chat checkpoints for unpicking careless talk

    Source URL: https://www.theregister.com/2025/08/12/vs_code_previews_chat_checkpoints/ Source: The Register Title: VS Code previews chat checkpoints for unpicking careless talk Feedly Summary: Microsoft’s AI-centric code editor and IDE adds the ability to rollback misguided AI prompts The Microsoft Visual Studio Code (VS Code) team has rolled out version 1.103 with new features including GitHub Copilot chat checkpoints.… AI Summary…

  • Slashdot: LLMs’ ‘Simulated Reasoning’ Abilities Are a ‘Brittle Mirage,’ Researchers Find

    Source URL: https://slashdot.org/story/25/08/11/2253229/llms-simulated-reasoning-abilities-are-a-brittle-mirage-researchers-find?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: LLMs’ ‘Simulated Reasoning’ Abilities Are a ‘Brittle Mirage,’ Researchers Find Feedly Summary: AI Summary and Description: Yes Summary: Recent investigations into chain-of-thought reasoning models in AI reveal limitations in their logical reasoning capabilities, suggesting they operate more as pattern-matchers than true reasoners. The findings raise crucial concerns for industries…

  • Simon Willison’s Weblog: Chromium Docs: The Rule Of 2

    Source URL: https://simonwillison.net/2025/Aug/11/the-rule-of-2/ Source: Simon Willison’s Weblog Title: Chromium Docs: The Rule Of 2 Feedly Summary: Chromium Docs: The Rule Of 2 Alex Russell pointed me to this principle in the Chromium security documentation as similar to my description of the lethal trifecta. First added in 2019, the Chromium guideline states: When you write code…