Tomasz Tunguz: Data & AI Infrastructure Are Fusing

Oct 2, 2025

—

Source URL: https://www.tomtunguz.com/data–ai-infrastructure-are-fusing/
Source: Tomasz Tunguz
Title: Data & AI Infrastructure Are Fusing

Feedly Summary:

AI breaks the data stack.
Most enterprises spent the past decade building sophisticated data stacks. ETL pipelines move data into warehouses. Transformation layers clean data for analytics. BI tools surface insights to users.
This architecture worked for traditional analytics.
But AI demands something different. It needs continuous feedback loops. It requires real-time embeddings & context retrieval.
Consider a customer at an ATM withdrawing pocket money. The AI agent on their mobile app needs to know about that $40 transaction within seconds. Data accuracy & speed aren’t optional.
Netflix rebuilt their entire recommendation infrastructure to support real-time model updates1. Stripe created unified pipelines where payment data flows into fraud models within milliseconds2.
The modern AI stack requires a fundamentally different architecture. Data flows from diverse systems into vector databases, where embeddings & high-dimensional data live alongside traditional structured data. Context databases store the institutional knowledge that informs AI decisions.
AI systems consume this data, then enter experimentation loops. GEPA & DSPy enable evolutionary optimization across multiple quality dimensions. Evaluations measure performance. Reinforcement learning trains agents to navigate complex enterprise environments.
Underpinning everything is an observability layer. The entire system needs accurate data & fast. That’s why data observability will also fuse with AI observability to provide data engineers & AI engineers end-to-end understanding of the health of their pipelines.
Data & AI infrastructure aren’t converging. They’ve already fused.

References

Netflix Technology Blog. (2025, August). “From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix.” https://netflixtechblog.com/from-facts-metrics-to-media-machine-learning-evolving-the-data-engineering-function-at-netflix-6dcc91058d8d ↩︎

Stripe. (2025). “How We Built It: Stripe Radar.” https://stripe.com/blog/how-we-built-it-stripe-radar ↩︎

AI Summary and Description: Yes

Summary: The text discusses how traditional data stacks are insufficient for AI applications, emphasizing the need for real-time data processing and a different architectural framework. It highlights examples from companies like Netflix and Stripe, which have adapted their systems to meet AI’s demands.

Detailed Description:
The relevance of the text to AI and cloud infrastructure professionals is significant, particularly as it underlines the evolving landscape of data management in response to the needs of AI systems. Here are the major points discussed in the text:

– **Transition from Traditional Stacks**:
– Traditional data architectures (ETL pipelines, data warehouses, BI tools) were effective for conventional analytics. However, the onset of AI requires new capabilities that these architectures cannot support.

– **Real-Time Data Requirements**:
– AI applications demand continuous feedback and real-time data processing. For example, an AI agent assisting a customer needs swift access to transaction data to provide accurate insights.

– **Examples of Adaptation**:
– **Netflix**: Overhauled its recommendation infrastructure to accommodate real-time model updates, demonstrating the necessity for agile data systems.
– **Stripe**: Developed unified data pipelines enabling rapid data flow into fraud detection models, illustrating the requirement for speed and agility.

– **Modern AI Stack**:
– The new architecture involves data flowing into vector databases for managing high-dimensional and structured data. Context databases also play a role by housing institutional knowledge relevant to AI decision-making.

– **Experimentation and Optimization**:
– AI systems enter experimental cycles where tools like GEPA and DSPy facilitate ongoing performance evaluation and optimization through reinforcement learning.

– **Observability Layer**:
– A crucial element for handling modern data and AI infrastructure is improved observability, ensuring accurate data and system health across pipelines. Merging data observability with AI observability is suggested as essential for comprehensive management.

– **Convergence of Data and AI**:
– The text concludes by stating that data and AI infrastructures are not just converging; they have fundamentally merged, requiring new strategies for management and operation.

This analysis brings forth critical insights for security and compliance professionals, as understanding these architectural shifts can influence approaches to data governance, real-time data handling, and the integration of AI into existing frameworks.

1 10 2 2025 4 5 6 a access accuracy Act adaptation age agent Agent Assist agents AGI agile agility AI AI applications AI systems All alt analysis analytics and API app Application applications Arch architectural architecture architectures art as at ated Bi building built by C capabilities CI CIA Cloud cloud infrastructure co companies compliance compliance professionals Context context data Context Databases context retrieval continuous convergence critical cross custom Customer D data data accuracy data architecture data engineering data engineers data flow data flows data governance Data Handling data management Data Observability data pipeline data pipelines data processing data systems data warehouses database databases de decision decision-making decisions demand demo detection detection model e e-learning edge effective embeddings end Engineer engineering engineers enterprise enterprise environments enterprises environment environments ERP evaluation evaluations evolutionary optimization exp experimentation face fact fast feedback feedback loops for framework frameworks fraud fraud detection function g Gen Go governance gs H handling health high high-dimensional data Highlight HR http HTTPS in Influence Inforce infrastructure infrastructure professionals infrastructures insights integration io Iron ite J Just k knowledge l land learning led Li line long loop low M mac machine Machine Learning making man management media metrics Mobile Mobile App Mode model models Modern multi N needs Netflix new new strategies NGO no NSA o observability of on one ons operation ops opt optimization oS oss out over pay payment payment data per performance performance evaluation phi Pipeline pipelines play PoC point pre pro process processing professionals ps Py Q quality R radar rate RCE re ready real real-time real-time data real-time data processing red reinforcement reinforcement learning relevance Requirements response retrieval Ro Role s sec security security and compliance shift side Sig source speed spy SSE stack strategies structured structured data structures support Swift system systems T tech technology ted text text retrieval the Time time data time data handling time data processing to tool tools Tor TP Transform transformation transition trie UI UN under up update updates US use user Users V val Valuation vector vector database vector databases Ware Wi x yt z