Cloud Blog: AI infrastructure is hot. New power distribution and liquid cooling infrastructure can help

Apr 29, 2025

—

Source URL: https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit/
Source: Cloud Blog
Title: AI infrastructure is hot. New power distribution and liquid cooling infrastructure can help

Feedly Summary: AI is fundamentally transforming the compute landscape, demanding unprecedented advances in data center infrastructure. At Google, we believe that physical infrastructure — the power, cooling, and mechanical systems that underpin everything — isn’t just important, but critical to AI’s continued scaling.
We have a long-standing partnership with the Open Compute Project (OCP) that has been instrumental in driving industry collaboration and open innovation in infrastructure. At the 2025 OCP EMEA Summit today, we discussed the power delivery transformation from 48 volts direct current (VDC) to the new +/-400 VDC, which will enable IT racks to scale from 100 kilowatts up to 1 megawatt. We also shared that we’ll contribute our fifth-generation cooling distribution unit, Project Deschutes, to OCP, helping to accelerate adoption of liquid cooling industry-wide.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Transforming power delivery with 1 MW per IT rack
Google has a long history of advancing data center power delivery. Almost 10 years ago, we championed the adoption of 48 VDC inside the IT rack to significantly increase the power distribution efficiency and reduce losses compared to what typical 12 VDC solutions delivered. The industry responded to our call to action to collaborate on this technology, and the resulting architecture has worked well, scaling from 10 kilowatts to 100 kilowatts IT racks.
The AI era requires even greater power delivery capabilities for two distinct reasons. The first is simply that ML will require more than 500 kW per IT rack before 2030. The second is the densification of each IT rack, where every millimeter of space in the IT rack is used for tightly interconnected “xPUs” (e.g. GPUs, TPUs, CPUs). This requires a much higher voltage DC power distribution solution, where power components and battery backup are outside of the IT rack.
We are excited to introduce +/-400 VDC power delivery that can support up to 1 MW per rack. This is about much more than simply increasing power delivery capacity — selecting 400 VDC as the nominal voltage allows us to leverage the supply chain established by electric vehicles (EVs), for greater economies of scale, more efficient manufacturing, and improved quality and scale, to name a few. As part of the Mt Diablo project, we are collaborating with Meta, and Microsoft at OCP to standardize the electrical and mechanical interfaces, and the 0.5 specification draft will be available for industry feedback in May.
The first embodiment of this work is an AC-to-DC sidecar power rack that disaggregates power components from the IT rack. This solution improves the end-to-end efficiency by ~ 3% while enabling the entire IT rack to be used for xPUs. Longer term, we are exploring directly distributing higher-voltage DC power within the data center and to the rack, for even greater power density and efficiency.

+/-400 VDC power delivery: AC-to-DC sidecar power rack

The liquid cooling imperative
The dramatic increase in chip power consumption — from 100W chips to accelerators exceeding 1000W — has made advanced thermal management essential. Packing more powerful chips into racks also creates significant challenges for cooling density. Liquid cooling has emerged as the clear solution, given its superior thermal and hydraulic properties. Water can transport approximately 4000 times more heat per unit volume than air for a given temperature change, while the thermal conductivity of water is roughly 30 times greater than air.
At Google, we’ve deployed liquid cooling at GigaWatt scale across more than 2000 TPU Pods in the past seven years with remarkable uptime — consistently at about 99.999%. Google first used liquid cooling in TPU v3 that was deployed in 2018. Liquid-cooled ML servers have nearly half the geometrical volume of their air-cooled counterparts because they replace bulky heatsinks with compact cold plates. This allowed us to double chip density and quadruple the size of our liquid-cooled TPU v3 supercomputer compared to the air-cooled TPU v2 generation.
We’ve continued to refine this technology generation over generation, from TPU v3 and TPU v4, through TPU v5, and most recently, Ironwood. Our implementation utilizes in-row coolant distribution units (CDUs) with redundant components and uninterruptible power supplies (UPS) for high availability. These CDUs isolate the rack’s liquid loop from the facility loop, providing a controlled, high-performance cooling system delivered via manifolds, flexible hoses, and cold plates that are directly attached to the high-power chips. In our CDU architecture, named Project Deschutes, the pump and heat exchanger unit is redundant, which is what has enabled us to consistently achieve the above-mentioned fleet-wide CDU availability of ~99.999% since 2020.
We will contribute the fifth-generation Project Deschutes CDU, currently in development, to OCP later this year. This contribution, including system details, specifications, and best practices, is intended to help accelerate the industry’s adoption of liquid cooling at scale. Our insights are drawn from nearly a decade of designing and deploying liquid cooling across four generations of TPUs, and encompass:

Design for high cooling performance

Manufacturing quality

Reliability and uptime

Deployment velocity

Serviceability and operational excellence

Supply ecosystem advancements

Project Deschutes CDU: 4th gen in deployment, 5th gen in concept

Get ready for the next generation of AI
We’re encouraged by the significant strides the industry has made in power delivery and liquid cooling. However, with the accelerating pace of AI hardware development, it’s clear that we must collectively quicken our pace to prepare data centers for what’s next. We’re particularly excited about the potential for rapid industry adoption of +/-400 VDC, facilitated by the upcoming Mt Diablo specification. We also strongly encourage the industry to adopt the Project Deschutes CDU design and leverage our extensive liquid cooling learnings. Together, by embracing these advancements and fostering deeper collaboration, we believe the most impactful innovations are still ahead.

AI Summary and Description: Yes

**Summary:** The text discusses significant advancements in data center infrastructure essential for supporting AI’s increasing power demands. Google highlights its collaboration with the Open Compute Project to transition to +/-400 VDC power delivery, aiming for improved efficiency and scalability in IT racks, as well as promoting liquid cooling solutions to manage rising chip power consumption effectively.

**Detailed Description:**

The text outlines critical updates from Google regarding advancements in data center infrastructure tailored to meet the evolving needs of AI technologies. Key points include:

– **Power Delivery Transformation:**
– Shift from 48 VDC to +/-400 VDC to enable IT racks to scale from 100 kW to 1 MW.
– This transition is crucial as machine learning anticipates needing over 500 kW per rack by 2030.

– **Collaboration and Standardization:**
– Google collaborates with major tech firms (Meta and Microsoft) under the Mt Diablo project to standardize power distribution interfaces.
– The coming release of the 0.5 specification draft reflects industry feedback initiatives to unify standards.

– **New Cooling Solutions:**
– Introduction of advanced liquid cooling systems to manage the thermal output of increasingly powerful chips.
– Liquid cooling proves to be more effective than traditional air cooling, enhancing chip density and reducing physical space requirements.

– **Enhancements in Design and Efficiency:**
– The AC-to-DC sidecar power rack architecture separates power components from IT racks to maximize space efficiency for high-processing units like GPUs.
– Google’s liquid-cooled TPU Pods are noted for their high uptime of around 99.999%.

– **Future Aspirations and Industry Advocacy:**
– Google emphasizes the collective need to hasten the adoption of high-voltage DC systems and liquid cooling to stay ahead in AI infrastructure.
– The sharing of Project Deschutes CDU design and operational insights is aimed at accelerating industry practices and fostering collaboration for future innovations.

**Key Takeaways for Security and Compliance Professionals:**
– The planned increases in power capabilities necessitate rigorous security protocols as higher power systems may present increased physical and operational vulnerabilities.
– Robust infrastructure addressing cooling and power reliability is essential for compliance with energy efficiency regulations and standards in cloud computing environments.
– Understanding these innovations marks an important step for organizations that rely on AI capabilities, highlighting the interconnectedness between infrastructure development and security resilience.

01 1 10 2 2025 3 4 5 a accelerator accelerators Act adoption advancement advancements advocacy AI AI technologies air and anti API app Arch architecture art as availability backup Best best practices by C capabilities capacity CDU cell chain challenges chip chip density chips CI CIA CleaR Cloud cloud computing co Col collaboration compliance compliance professionals compute computer Computing computing environments concept Console consumption control cooling solutions Cooling Systems CPU CPUs critical cross Current D data data center data center infrastructure data centers day de deep demand deployment design development Double Driving e ecosystem effective efficiency efficient electric vehicles end energy energy efficiency environment ERP Excel exp face fact feedback fine first for free future g Gen generation geo gigawatt Go Google GPU GPUs gs H hardware hardware development high high availability high-performance Highlight HR http HTTPS image implementation in industry industry adoption industry collaboration infrastructure infrastructure development initiatives innovation Innovations insights inter interface Iron IRS ite J Just k Key l Labor land learning led Lee Li liability liquid liquid cooling liquid cooling systems lm long loop low M mac machine Machine Learning made man management manufacturing max Meta Micro Microsoft ML N needs next no non o of on one open Open Compute Project Open Innovation operation operational excellence operational insights OPM opt organization organizations ory out output over partnership performance point porting potential Power power consumption power delivery power demand power density power distribution pre process processing professionals project protocol protocols Q quality QUIC R rack rag Rama rate RCE ready red Regulation regulations release reliability Requirements resilience Ro RoT s scalability Scale scaling sec security security and compliance security protocols security resilience server servers service SHA sharing shift side Sig Sim size solutions source specific SSE standardization standards start Summit supercomputer supply supply chain support system systems T Tails tech tech firms technologies technology text the thermal management Time to Tor TP transformation transition trial tts two UI under up update updates ups uptime US use V V3 vulnerabilities Ware Well Wi x