Simon Willison’s Weblog: Deep research System Card

Feb 25, 2025

—

Source URL: https://simonwillison.net/2025/Feb/25/deep-research-system-card/#atom-everything
Source: Simon Willison’s Weblog
Title: Deep research System Card

Feedly Summary: Deep research System Card
OpenAI are rolling out their Deep research “agentic" research tool to their $20/month ChatGPT Plus users today, who get 10 queries a month. $200/month ChatGPT Pro gets 120 uses.
Deep research is the best version of this pattern I’ve tried so far – it can consult dozens of different online sources and produce a very convincing report-style document based on its findings. I’ve had some great results.
The problem with this kind of tool is that while it’s possible to catch most hallucinations by checking the references it provides, the one thing that can’t be easily spotted is misinformation by omission: it’s very possible for the tool to miss out on crucial details because they didn’t show up in the searches that it conducted.
Hallucinations are also still possible though. From the system card:

The model may generate factually incorrect information, which can lead
to various harmful outcomes depending on its usage. Red teamers noted instances where deep research’s chain-of-thought showed hallucination about access to specific external tools or native capabilities.

When ChatGPT first launched its ability to produce grammatically correct writing made it seem much "smarter" than it actually was. Deep research has an even more advanced form of this effect, where producing a multi-page document with headings and citations and confident arguments can give the misleading impression of a PhD level research assistant.
It’s absolutely worth spending some time exploring.
There’s a slightly unsettling note in the section about chemical and biological threats:

Several of our biology evaluations indicate our models are on the cusp of being able to meaningfully help novices create known biological threats, which would cross our high risk threshold. We expect current trends of rapidly increasing capability to continue, and for models to cross this threshold in the near future. In preparation, we are intensifying our investments in safeguards.

Tags: air, ai-agents, openai, chatgpt, generative-ai, llms, ethics

AI Summary and Description: Yes

Summary: The text discusses OpenAI’s rollout of their Deep research “agentic” tool for ChatGPT users, which showcases advanced capabilities in generating research-style documents. However, it raises significant concerns regarding the potential for misinformation and ethical implications, particularly in relation to biological threats.

Detailed Description: The recent introduction of OpenAI’s Deep research tool emphasizes both the potential and the risks associated with generative AI technologies. This tool aims to assist users by providing a commendable facility for in-depth research and documentation. However, several critical issues arise from its usage, particularly for security and compliance professionals.

– **Tool Description:**
– **Deep Research Tool:** Aimed at $20/month ChatGPT Plus and $200/month Pro users.
– **Functionality:** Allows querying multiple online sources to generate detailed reports.
– **User Experience:** The author has noted positive outcomes from this tool, which is regarded as one of the most effective iterations of its kind.

– **Concerns about Misinformation:**
– **Hallucinations:** The risk of generating factually incorrect information remains, where users may receive confident yet misleading results.
– **Misrepresentation:** The tool can create a facade of intelligence and expertise through structured documents, which can lead users to misinterpret its reliability.
– **Misinformation by Omission:** Missing critical information in its searches can pose additional dangers.

– **Ethical Implications:**
– **Biological and Chemical Threats:** A noteworthy warning indicates that the AI models could potentially assist users in creating biological threats, indicating a significant ethical concern.
– **Investment in Safeguards:** OpenAI acknowledges the potential dangers and plans to enhance protective measures around this capability.

This development in AI tools necessitates increased vigilance and ethical considerations from security and compliance professionals, particularly regarding misinformation, the integrity of generated content, and the potential for misuse in sensitive areas such as biotechnology. The ongoing evolution of generative AI demands an elevated focus on regulatory compliance and risk management strategies.

.NET 1 2 5 a access Act advanced capabilities agent agents AI ai model AI models AI technologies AI tool AI tools ai-agents air and API Arch ARM art as assistant based being Best biological biotech biotechnology by C capabilities CERN chain chat ChatGPT checking CIA citations compliance compliance professionals concerns content critical cross Current D day de deep Deep Research depth development document documentation e edge effective end ERP ethical ethical considerations ethical implications Ethics evaluation evaluations exp experience expert expertise External fact first for full functionality future g Gen generated Generated Content generative Generative AI Go GPT gs H hallucination hallucinations high HR http HTTPS implications in information integrity Intel intelligence inter interpret investment Investments IoT IRS ite k knowledge l led Li liability llm llms lm logic low man management management strategies misinformation mission misuse model models multi N nation native no o of on one open openai OPM ory out potential pre preparation problem professionals protective measures R rate RCE red red team regulatory regulatory compliance reliability report representation research Risk risk management risk management strategies risks RMF Ro RoT s safe safeguards search Search tool sec security security and compliance side Sig Sim SoC source specific SRE SSE structured structured documents system T Tags: Tails tech technologies technology text the Thought threat threats Time to tool tools Tor TP trends trie up US usage use user user experience Users uth V val Valuation version vigilance web Wi x Zen