Simon Willison’s Weblog: How ProPublica Uses AI Responsibly in Its Investigations

Source URL: https://simonwillison.net/2025/Mar/14/propublica-ai/
Source: Simon Willison’s Weblog
Title: How ProPublica Uses AI Responsibly in Its Investigations

Feedly Summary: How ProPublica Uses AI Responsibly in Its Investigations
Charles Ornstein describes how ProPublic used an LLM to help analyze data for their recent story A Study of Mint Plants. A Device to Stop Bleeding. This Is the Scientific Research Ted Cruz Calls “Woke.” by Agnel Philip and Lisa Song.
They ran ~3,400 grant descriptions through a prompt that included the following:

As an investigative journalist, I am looking for the following information

woke_description: A short description (at maximum a paragraph) on why this grant is being singled out for promoting “woke" ideology, Diversity, Equity, and Inclusion (DEI) or advanced neo-Marxist class warfare propaganda. Leave this blank if it’s unclear.
why_flagged: Look at the "STATUS", "SOCIAL JUSTICE CATEGORY", "RACE CATEGORY", "GENDER CATEGORY" and "ENVIRONMENTAL JUSTICE CATEGORY" fields. If it’s filled out, it means that the author of this document believed the grant was promoting DEI ideology in that way. Analyze the "AWARD DESCRIPTIONS" field and see if you can figure out why the author may have flagged it in this way. Write it in a way that is thorough and easy to understand with only one description per type and award.
citation_for_flag: Extract a very concise text quoting the passage of "AWARDS DESCRIPTIONS" that backs up the "why_flagged" data.

This was only the first step in the analysis of the data:

Of course, members of our staff reviewed and confirmed every detail before we published our story, and we called all the named people and agencies seeking comment, which remains a must-do even in the world of AI.

I think journalists are particularly well positioned to take advantage of LLMs in this way, because a big part of journalism is about deriving the truth from multipl unreliable sources of information. Journalists are deeply familiar with fact-checking, which is a critical skill if you’re going to report with the assistance of these powerful but unreliable models.
Agnel Philip:

The tech holds a ton of promise in lead generation and pointing us in the right direction. But in my experience, it still needs a lot of human supervision and vetting. If used correctly, it can both really speed up the process of understanding large sets of information, and if you’re creative with your prompts and critically read the output, it can help uncover things that you may not have thought of.

Tags: prompt-engineering, structured-extraction, generative-ai, ai, data-journalism, llms, journalism, ethics

AI Summary and Description: Yes

Summary: The text discusses how ProPublica employed a large language model (LLM) to enhance its journalistic investigations, emphasizing responsible AI use. This approach demonstrates the potential of AI in aiding fact-checking and data analysis, while also highlighting the need for human oversight, especially in sensitive areas like investigative journalism.

Detailed Description:

The provided text outlines how ProPublica utilized AI technology, specifically a large language model, in their investigative reporting. Key aspects of this process include:

– **Use of LLM**: ProPublica analyzed approximately 3,400 grant descriptions using LLM technology to extract insights relevant to their investigations.
– **Prompt Utilization**: The journalists crafted detailed prompts that requested specific information about grants being labeled as promoting “woke” ideologies. This included prompts for:
– Descriptions outlining why a grant was flagged.
– Detailed reasons for the flagging based on social justice categories.
– Concise citations from award descriptions to support the flagged information.

– **Verification and Human Oversight**: Despite employing AI, ProPublica emphasized the importance of human validation. The staff confirmed all AI-generated outputs before publication and engaged in outreach to relevant parties for comment, ensuring ethical journalistic practices.

– **Journalistic Advantages**: The text highlights the transformative potential of AI in journalism, particularly for:
– Lead generation.
– Processing and understanding vast quantities of information.
– Enhancing creativity in formulating inquiries and analyzing outputs.

– **Cautions Necessitated by AI**: The text also warns of the challenges associated with AI-generated content, stressing the necessity of human oversight and fact-checking to mitigate the model’s unreliability.

Overall, this analysis serves as a significant case study for professionals in AI, journalism, and ethics, showcasing how innovative technologies can be fused with responsible practices to enhance outcomes in complex fields.

– **Key Takeaways for Professionals**:
– AI can augment investigative efforts but should always be complemented by human scrutiny.
– Crafting effective prompts is essential in guiding AI towards useful outputs.
– Maintaining ethical standards and practices in AI application is critical, especially in sensitive areas like journalism.