Cloud Blog: H4D VMs: Next-generation HPC-optimized VMs

Source URL: https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc/
Source: Cloud Blog
Title: H4D VMs: Next-generation HPC-optimized VMs

Feedly Summary: At Google Cloud Next, we introduced H4D VMs, our latest machine type for high performance computing (HPC). Building upon existing HPC offerings, H4D VMs are designed to address the evolving needs of demanding workloads in industries such as manufacturing, weather forecasting, EDA, and healthcare and life sciences.
H4D VMs are powered by the 5th Generation AMD EPYCTM Processors, offering improved whole-node VM performance of more than 12,000 gflops and improved memory bandwidth of more than 950 GB/s. H4D provides low-latency and 200 Gbps network bandwidth using Cloud Remote Direct Memory Access (RDMA) on Titanium, the first of our CPU-based VMs to do so. This powerful combination enables you to efficiently scale your HPC workloads and achieve insights faster.

VM and core performance, as well as memory bandwidth for H4D vs. C2D and C3D, showing generational improvement

For open-source High-Performance Linpack (OSS-HPL), a widely-used benchmark for measuring the floating-point computing power of supercomputers, H4D offers 1.8x higher performance per VM and 1.6x higher performance per core compared to C3D. Additionally, H4D offers 5.8x higher performance per VM and 1.7x higher performance per core compared to C2D.
For STREAM Triad, a benchmark to measure memory bandwidth, H4D offers 1.3x higher performance per VM and 1.4x higher performance per core compared to C3D. Additionally, H4D offers 3x higher performance per VM and 1.4x higher performance per core compared to C2D.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

Improved HPC application performance
H4D VMs deliver strong compute performance and memory bandwidth, significantly outperforming previous generations of AMD-based VMs like C2D and C3D, allowing for faster simulations and analysis, and delivering significant performance gains (relative to a prior generation AMD-based HPC VM, C2D) across various HPC applications and benchmarks, as illustrated below:

Manufacturing

CFD apps like SiemensTM Simcenter STAR-CCM+TM/HIMach show up to 3.6x improvement.
CFD apps like Ansys Fluent/f1_racecar_140 show up to 3.6x improvement.
FEA Explicit apps like Altair Radioss/T10m show up to 3.6x improvement.
CFD apps like OpenFoam/Motorbike_20m show up to 2.9x improvement. 
FEA Implicit apps like Ansys Mechanical/gearbox shows up to 2.7x improvement.

Healthcare and life sciences:

Molecular Dynamics (GROMACS) shows up to 5x improvement.

Weather forecasting

Industry standard benchmark WRFv4 shows up to 3.6x improvement.

Figure 2: Single VM HPC Application performance (speed-up) of H4D, C3D and C2D relative to C2D. Applications ran on single VMs using all cores.

“Our deep collaboration with Google Cloud powers the next generation of cloud-based HPC with the announcement of the new H4D VMs. Google Cloud has leveraged the architectural advances of our 5th Gen AMD EPYC CPUs to create an offering that delivers impressive performance uplift compared to previous generations across a variety of HPC benchmarks. This will empower customers to achieve fast insights and accelerate their most demanding HPC workloads.” – Ram Peddibhotla, corporate vice president, Cloud Business, AMD
Faster HPC with Cloud RDMA on Titanium
H4D’s performance is made possible with Cloud RDMA, a new Titanium offload that’s available for the first time on these VMs. Cloud RDMA is specifically engineered to support HPC workloads that rely heavily on inter-node communication, such as computational fluid dynamics, weather modeling, molecular dynamics, and more. By offloading network processing, Cloud RDMA provides predictable, low-latency, high-bandwidth communication between compute nodes, thus minimizing host CPU bottlenecks. 
Under the hood, Cloud RDMA uses Google’s innovative Falcon hardware transport for reliable, low-latency communication over our Ethernet-based data center networks, effectively resolving the traditional challenges of RDMA over Ethernet while helping to ensure predictable, high performance at scale. 
Cloud RDMA over Falcon speeds up simulations by efficiently utilizing more computational resources. For example, for smaller CFD problems like OpenFoam/motorbike_20m and Simcenter Star-CCM+/HIMach10, which have limited inherent parallelism and are typically challenging to accelerate, H4D results in 3.4x and 1.9x speedup, respectively, on four VMs compared to TCP.

Figure 3: Left: OpenFoam/Motorbike_20m offers a 3.4x improvement with H4D Cloud RDMA over TCP at four VMs. Right: Simcenter STAR-CCM+/HIMach10 offers a 1.9x improvement with H4D Cloud RDMA over TCP at four VMs.

For larger models, Falcon also helps maintain strong scaling. Using 32 VMs, Falcon achieved a 2.8x speedup over TCP for GROMACS/Lignocellulose and a 1.3x speedup for WRFv4/Conus 2.5km.

Figure 4: Left: GROMACS/Lignocellulose offers a 2.8x improvement with H4D Cloud RDMA over TCP at 32 VMs. Right: WRFv4/Conus 2.5km offers a 1.3x improvement with H4D Cloud RDMA over TCP at 32 VMs.

Cluster management and scheduling capabilities
H4D VMs will support both Dynamic Workload Scheduler (DWS) and Cluster Director (formerly known as Hypercompute Cluster).
DWS helps schedule HPC workloads for optimal performance and cost-effectiveness, providing resource availability for time-sensitive simulations and flexible HPC jobs.
Cluster Director, which lets you deploy and scale a large, physically-colocated accelerator cluster as a single unit, is now extending its capabilities to HPC environments. Cluster Director simplifies deploying and managing complex HPC clusters on H4D VMs by allowing researchers to easily set up and run large-scale simulations.
VM sizes and regional availability
We offer H4D VMs in both standard and high-memory configurations to cater to diverse workload requirements. We also provide options with local SSD for workloads that demand high-speed storage, such as CPU-based seismic processing and structural mechanics applications (e.g., Abaqus, NASTRAN, Altair OptiStruct and Ansys Mechanical).

VM

Cores

Memory

Local SSD

h4d-highmem-192-lssd

192

1488

3.75TB

h4d-standard-192

192

720

N/A

h4d-highmem-192

192

1488

N/A

H4D VMs are currently available in us-central1-a (Iowa), and europe-west4-b (Netherlands), with additional regions in progress.
What our customers and partners are saying

“With the power of Google’s new H4D-based clusters, we are poised to simulate systems approaching a trillion particles, unlocking unprecedented insights into circulatory functions and diseases. This leap in computational capability will dramatically accelerate our pursuit of breakthrough therapeutics, bringing us closer to effective precision therapies for blood vessel damage in heart disease." – Petros Koumoutsakos, Jr. Professor of Computing in Science and Engineering, Harvard University

“The launch of Google Cloud’s H4D platform marks a significant advancement in engineering simulation. As GCP’s first VM with RDMA over Ethernet, combined with higher memory bandwidth, generous L3 cache, and AVX-512 instruction support, H4D delivers up to 3.6x better performance for Ansys Fluent simulations compared to C2D VMs. This performance boost allows our customers to run simulations faster, explore a wider range of design options, and drive innovation with greater efficiency.” – Wim Slagter, Senior Director of Partner Programs, Ansys

"The generational performance leap achieved with Google H4D VMs, powered by the 5th Generation AMD EPYC™, is truly remarkable. For compute-intensive, highly non-linear simulations, such as car crash analysis, Altair® Radioss® delivers a stunning 3.6x speedup. This breakthrough paves the way for faster and more accurate simulations, which is crucial for our customers in the era of the digital thread!” – Eric Lequiniou, SVP Radioss Development and Altair Solvers HPC

“The latest H4D VMs, powered by 5th Generation AMD EPYC Processors and Cloud RDMA, allow our customers to realize faster time-to-results for their Simcenter STAR-CCM+ simulations. For HIMach10, we’re seeing up to 3.6x performance gains compared to the C2D instance and 1.9x speedup on four H4D Cloud RDMA VMs compared to TCP. Our partnership with Google has been key to achieving these reduced simulation times.” – Lisa Mesaros, Vice President, Simcenter Solution Domains Product Management, Siemens

Want to try it out?
We’re excited to see how H4D VMs will empower you to achieve faster results with your HPC workloads! Sign up for the preview by filling out this form.

AI Summary and Description: Yes

Summary: The announcement of Google Cloud’s H4D VMs introduces a new frontier in high-performance computing (HPC), aimed at enhancing capabilities across various demanding industries such as healthcare, manufacturing, and weather forecasting. These VMs promise significant performance improvements over previous AMD-based models, particularly through advanced technologies like Cloud RDMA, potentially transforming HPC applications and accelerating insight generation for users.

Detailed Description:

– **Introduction of H4D VMs**: Google Cloud has launched H4D VMs, tailored for high-performance computing to meet the demands of intensive workloads in sectors such as manufacturing, healthcare, and weather forecasting.

– **Technological Advances**:
– Powered by 5th Generation AMD EPYCTM processors, H4D VMs show impressive performances:
– Over 12,000 gflops whole-node VM performance.
– More than 950 GB/s memory bandwidth.
– Utilizes Cloud RDMA for low-latency and 200 Gbps network speeds.

– **Performance Improvements**:
– Benchmarks have demonstrated:
– **OSS-HPL**: 1.8x and 5.8x higher performance per VM versus C2D and C3D, respectively.
– **STREAM Triad**: 1.3x and 3x higher for C3D and C2D.
– Specific application insights:
– Manufacturing simulations show up to 3.6x improvements across several applications.
– Healthcare improvements of up to 5x in molecular dynamics simulations.
– Weather forecasting enhancements of 3.6x on industry-standard benchmarks.

– **Innovative Communication with Cloud RDMA**:
– Facilitates efficient inter-node communication, crucial for complex simulations.
– Resolves traditional issues associated with RDMA over Ethernet with Google’s Falcon hardware transport, ensuring predictability and high performance at scale.

– **Cluster Management Capabilities**:
– Support for Dynamic Workload Scheduler (DWS) and Cluster Director, simplifying the management of HPC clusters and optimizing workload efficiency.

– **Availability and Configuration**:
– H4D VMs available in different configurations to accommodate various workload requirements with high-memory options and local SSD availability for storage-intensive tasks.

– **Client Testimonials and Future Impact**:
– Prominent clients express a transformative potential with H4D, highlighting significant performance enhancements leading to faster, more accurate simulation results—a vital consideration in engineering and life sciences.

In summary, the launch of H4D VMs represents a significant innovation in cloud computing, particularly for high-performance applications, underscoring the relevance for security and compliance professionals in ensuring the optimal performance while maintaining robust security protocols within such advanced systems.