The Register: HPE goes Cray for Nvidia’s Blackwell GPUs, crams 224 into a single cabinet

Source URL: https://www.theregister.com/2024/11/13/hpe_cray_ex/
Source: The Register
Title: HPE goes Cray for Nvidia’s Blackwell GPUs, crams 224 into a single cabinet

Feedly Summary: Meanwhile, HPE’s new ProLiant servers offer choice of Gaudi, Hopper, or Instinct acceleration
If you thought Nvidia’s 120 kW NVL72 racks were compute dense with 72 Blackwell accelerators, they have nothing on HPE Cray’s latest EX systems, which will pack more than three times as many GPUs into a single cabinet.…

AI Summary and Description: Yes

Summary: The text discusses HPE Cray’s upcoming EX154n supercomputing platform, which boasts impressive compute density and performance metrics. With support for up to 224 Nvidia Blackwell GPUs and a substantial amount of processing power, this system highlights significant advancements in high-performance computing (HPC) and the infrastructure required for handling demanding AI and machine learning workloads.

Detailed Description: The passage provides an overview of HPE’s new supercomputing solutions that emphasize both compute density and advancements in infrastructure technology. Key points include:

– **HPE Cray EX154n Platform**:
– Supports up to **224 Nvidia Blackwell GPUs**.
– Can handle over **10 petaFLOPS** at FP64 for HPC applications, or **over 4.4 exaFLOPS** for AI workloads.
– Each accelerator blade consists of two **Grace Blackwell Superchips** (GB200), integrating GPUs and a 72-core Arm CPU.

– **Infrastructure Capabilities**:
– Requires **300 kW** of power per rack, indicating high energy demands.
– Incorporates **liquid cooling** solutions, designed with a **fanless architecture** to improve efficiency and performance.
– New **Slingshot 400 Ethernet NICs** increase bandwidth capabilities from **200 Gbps to 400 Gbps**, serving the needs of high-performance networking.

– **Product Availability**:
– Anticipated shipping for the super-dense Blackwell systems is projected for **late 2025**.
– There are also plans for releasing **Epyc-based EX4252 Gen 2 compute blades** and upgraded **E2000 storage** systems that will enhance I/O performance, employing faster **PCIe 5.0-based NVMe storage**.

– **ProLiant Compute Servers**:
– Introduction of new air-cooled servers leveraging enterprise-grade **iLO management**.
– Support for a range of GPUs including Intel Gaudi3 and upcoming AMD accelerators, offering flexibility in processing power and memory configurations.

The content outlines significant developments in HPC, artificial intelligence (AI), and the infrastructures that support them, making it particularly relevant for professionals focused on AI Security, Infrastructure Security, and Cloud Computing. The drive toward more compute-intensive workloads and efficient resource management underlines the evolving landscape of cloud-based solutions and supercomputing capabilities.