Slashdot: AI Has Already Run Out of Training Data, Goldman’s Data Chief Says

Oct 2, 2025

—

Source URL: https://slashdot.org/story/25/10/02/191224/ai-has-already-run-out-of-training-data-goldmans-data-chief-says?utm_source=rss1.0mainlinkanon&utm_medium=feed
Source: Slashdot
Title: AI Has Already Run Out of Training Data, Goldman’s Data Chief Says

Feedly Summary:

AI Summary and Description: Yes

Summary: The text discusses a critical perspective on the current state of AI training data, highlighting the limitations developers face as they build new AI systems. It mentions the use of synthetic data and proprietary datasets as potential solutions, while emphasizing the importance of understanding data within its business context.

Detailed Description: The insights shared by Neema Raphael, Goldman Sachs’ chief data officer, reveal significant implications for professionals engaged in AI development and deployment. Key points include:

– **Training Data Shortage**: Raphael asserts that the AI field is running out of quality training data, which directly impacts how developers approach building AI systems.

– **Synthetic Data Use**: To combat the lack of fresh data, developers are leveraging synthetic data. This type of data, generated by machines, offers theoretically unlimited supply, but raises concerns regarding the quality of the output.

– **Impact of Proprietary Datasets**: The mention of proprietary datasets indicates that corporate data can enhance the value of AI tools, provided it is understood and normalized effectively in its business context.

– **Business Context**: Understanding the context of the data is crucial. Without this, the utility of AI solutions can be limited, underscoring the need for alignment between data attributes and business objectives.

– **Optimism in Data Utilization**: Despite concerns about data shortages, Raphael remains optimistic about the potential for deriving more insights from existing data sources, suggesting that there is still untapped potential within current datasets.

This dialogue is relevant not just to data scientists and AI developers but also to security and compliance professionals who must navigate the complexities of data governance, data privacy, and regulatory compliance as they operate within the realm of AI-driven technologies. Understanding the limits and capabilities of available data is essential for developing secure and compliant AI frameworks moving forward.

1 10 2 24 4 5 a Act age AGI AI AI developers AI development AI frameworks AI systems AI tool AI tools AI training alignment All and app as at ated attribute Bi building business business context by C capabilities CERN Chief Data Officer CI CIA co compliance compliance professionals concerns Context corporate data critical Current D data data governance data privacy data scientist data scientists data sources data use data utilization dataset datasets de deployment developer developers development DoT drive driven e effective face for framework frameworks Fresh Data g Gen generated Go governance H high Highlight http HTTPS impact implications in insights io ite J Just k Key l led Li limitations Link lm M mac machine man N new no non o of off on ons OPM opt ory out output over per point potential privacy pro professionals proprietary proprietary data proprietary datasets ps Q quality R rag Raise rate RCE re ready real red regulatory regulatory compliance Ro s scientists sec secure security security and compliance SHA short Sig solutions source SSE state supply synthetic Synthetic Data system systems T tech technologies ted text the to tool tools Tor TP training training data type UI UN under up US use utilization V val Wi x z