Cloud Blog: Google Cloud at GTC: A4 VMs now generally available, A4X VMs in preview

Source URL: https://cloud.google.com/blog/products/compute/google-cloud-goes-to-nvidia-gtc/
Source: Cloud Blog
Title: Google Cloud at GTC: A4 VMs now generally available, A4X VMs in preview

Feedly Summary: At Google Cloud, we’re thrilled to return to NVIDIA’s GTC AI Conference in San Jose CA this March 17-21 with our largest presence ever. The annual conference brings together thousands of developers, innovators, and business leaders to experience how AI and accelerated computing are helping humanity solve the most complex challenges. Join us to discover how to build and deploy AI with optimized training and inference, apply AI with real-world solutions, and experience AI with our interactive demos.
After being the first hyperscaler to make both NVIDIA’s HGX B200 and GB200 NVL72 available to customers with A4 and A4X VMs. We’re are pleased to announce that A4 VMs are generally available, and that A4X VMs are in preview with general availability coming soon.

A4X VMs: Accelerated by NVIDIA GB200 NVL72 GPUs, A4X VMs are purpose-built for training and serving the most demanding, extra-large-scale AI workloads — particularly those involving reasoning models, large language models (LLMs) with long context windows, and scenarios that require massive concurrency. This is enabled by unified memory across a large GPU domain and ultra-low-latency GPU-to-GPU connectivity. Each A4X VM contains 4 GPUs, and an entire 72 GPU system is connected via fifth-generation NVLink to deliver 720 petaflops of performance (FP8). A4X has achieved 860,000 tokens/sec of inference performance on a full NVL72 running Llama 2 70b

A4 VMs: Built on NVIDIA HGX B200 GPUs, the A4 VM provides excellent performance and versatility for diverse AI model architectures and workloads, including training, fine-tuning, and serving. Each A4 VM contains eight GPUs for a total of 72 petaflops of performance (FP8). A4 offers easy portability from prior generations of Cloud GPUs. This enables an easy upgrade to 2.2x increase in training performance over A3 Mega (NVIDIA H100 GPU).

“We’re excited that we were among the first to test A4 VMs, powered by NVIDIA Blackwell GPUs and Google Cloud’s AI Hypercomputer architecture. The sheer compute and memory advancements, combined with the 3.2 Tbps GPU-to-GPU interconnect via NVLink and the Titanium ML network adapter, are critical for us to train our models. Leveraging the Cluster Director simplifies the deployment and management of our large-scale training workloads. This gives our researchers the speed and flexibility to experiment, iterate, and refine trading models more efficiently.” – Gerard Bernabeu Altayo, Compute Lead, Hudson River Trading
The Google Cloud advantage 
A4 and A4X VMs are part of Google Cloud’s AI Hypercomputer, our supercomputing architecture designed for high performance, reliability, and efficiency for AI workloads. AI Hypercomputer brings together Google Cloud’s workload-optimized hardware, open software, and flexible consumption models to help simplify deployments, improve performance, and optimize costs. A4 and A4X VMs benefit from the following AI Hypercomputer capabilities:

AI-optimized architecture: A4 and A4X VMs are built on servers with our Titanium ML network adapter, which builds on NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience for AI workloads. Combined with our datacenter-wide 4-way rail-aligned network, A4 VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). You can scale to tens of thousands of NVIDIA Blackwell GPUs with our Jupiter network fabric with 13 Petabits/sec of bi-sectional bandwidth.
Simplified deployment with pre-built solutions: For large training workloads, Cluster Director offers dense co-location of accelerator resources, to help ensure host machines are allocated physically close to one another, provisioned as blocks of resources, and interconnected with a dynamic ML network fabric that minimizes network hops and optimizes for the lowest latency.
Scalable infrastructure: With support for up to 65,000 nodes per cluster, Google Kubernetes Engine (GKE) running on AI Hypercomputer is the most scalable Kubernetes service with which to implement a robust, production-ready AI platform. A4 and A4X VMs are natively integrated with GKE. And with integration to other Google Cloud services such as Hyperdisk ML for storage or BigQuery as a data warehouse, GKE facilitates data processing and distributed computing for AI workloads.
Fully-integrated, open software: In addition to support for CUDA, we work closely with NVIDIA to optimize popular frameworks with XLA such as PyTorch and JAX (including the reference implementation, MaxText), enabling increased performance of GPU infrastructure. Developers can easily incorporate powerful techniques like a latency hiding scheduler to minimize communication overhead (see XLA optimizations).
Flexible consumption models: In addition to the on-demand, committed use discount, and Spot consumption models, we reimagined cloud consumption for the unique needs of AI workloads with Dynamic Workload Scheduler, which offers two modes for different workloads: Flex Start mode for enhanced obtainability and better economics, and Calendar mode for predictable job start times and durations.  Dynamic Workload Scheduler improves your access to AI accelerator resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

NVIDIA and Google Cloud: Better together 
We’re continuously working together to provide our joint customers with an optimized experience. One  of our recent collaborations includes bringing together software innovations to accelerate AI-driven drug discovery. Using the NVIDIA BioNeMo framework and blueprints on GKE,  and PyTorch Lightning, we’re providing ready-to-use reference workflows for domain specific tasks. The NVIDIA BioNeMo Framework provides an optimized environment for training and fine-tuning biomolecular AI models. Read more here. 
Meet Google Cloud 
To connect with Google Cloud, please visit us at booth #914 at NVIDIA GTC, join our expert-led sessions listed below, or email us to set up a private meeting. Whether it’s your first time speaking with Google Cloud or the first time connecting with us at NVIDIA GTC, we’re looking forward to meeting with you. 
Deep dive into AI at expert-led sessions
Join our expert-led sessions to gain in-depth knowledge and develop practical skills in AI development on Google.
Tuesday, March 18
Optimizing the Future of Ads with MoE ModelsTime: 2:00 PM – 2:40 PM PDTSpeaker: Tris Warkentin, Director of Product Management, Google DeepMind
Wired for AI: Lessons from Networking 100K+ GPU AI Data Centers and CloudsTime: 4:00 PM – 5:00 PM PDTSpeakers: Dan Lenoski, VP Networking, Google and more industry leaders
Wednesday, March 19
Accelerate AI: Enhance Performance and Efficiency Using Google CloudTime: 10:00 AM – 10:40 AM PDTSpeakers: Roy Kim, Director Google Cloud GPUs, and Scott Dietzen, CEO, Augment Code 
Optimize your Workloads for Rack-Scale Interconnected GPU systemsTime: 3:00 PM – 3:40 PM PDT Speakers: Jon Olson, Software Engineer, Google Cloud, and Pramod Ramarao, Product Manager, Google
Thursday, March 20
Unlock the Speed of Light for Data Science Workflows With Gemini Coding Assistant Time: 8:00 AM – 8:40 AM PDTSpeaker: Paige Bailey, ENG MGR, Developer Relations, Google
Build Next-Generation AI Factories With DOCA-Accelerated NetworkingTime: 9:00 AM – 9:40 AM PDTSpeakers: Valas Valancius, Senior Staff Software Engineer, Google Cloud; Ariel Kit, Director, Product Management, NVIDIA; David Wetherall, Distinguished Engineer, Google Cloud
Physical AI for Humanoids: How Google Robotics Uses Simulation to Accelerate Humanoid Robotics TrainingTime: 9:00 AM – 9:40 AM PDTSpeaker: Erik Frey, Lead Researcher, Google
Toward Rational Drug Design With AlphaFold 3Time: 10:00 AM – 10:40 AM PDTSpeakers: Max Jaderberg, Chief AI Officer, Isomorphic Labs (DeepMind) and Sergei Yakneen, Chief Technology Officer, Isomorphic Labs (DeepMind)
AI in Action: Optimize Your AI InfrastructureTime: 11:00 AM – 11:40 AM PDTSpeakers: Chelsie Czop, Senior Product Manager, Google Cloud; Kshetrajna Raghavan, Machine Learning Engineer, Shopify; Ashwin Kannan, Principal Machine Learning Engineer, Palo Alto Networks; Jia Li, Chief AI Officer, Livex.AI
Horizontal Scaling of LLM Training with JAX Time: 2:00 PM – 2:40 PDTSpeakers: Andi Gavrilescu, Sr. Engineering Manager, Google; Matthew Johnson, Research Scientist, GoogleAbhinav Goel, Senior Deep Learning Architect, Google
On-Demand, Virtual Sessions
S74318: Deploy AI and HPC on NVIDIA GPUs With GoogleSpeakers: Annie Ma-Weaver, HPC  group Product Manager, Google Cloud; Wyatt Gorman, HPC and AI Solutions Manager, Google Cloud; Sam Skillman, HPC Software Engineer, Google Cloud
S74319: Supercharge Large-Scale AI with Google Cloud AI HypercomputerSpeakers: Rajesh Anantharaman, Product Management Lead, ML Software, Google Cloud  and Deepak Patil, Product Manager, Google Cloud
In addition to our expert-led sessions at NVIDIA GTC, we invite you to join us at the following events onsite (limited space available):

Executive Roundtable, Wednesday, March 19 at 8 AM

DGX Cloud on Google Cloud Roundtable, Thursday, March 20 at 8 AM

Developer Hands On Lab, Thursday, March 20 at 10 AM

AI Summary and Description: Yes

Summary: Google Cloud’s participation in NVIDIA’s GTC AI Conference highlights advancements in AI infrastructure, particularly through the introduction of A4 and A4X VMs optimized for large AI workloads. These innovations underscore the significance of enhanced performance and deployment efficiency in AI, which is crucial for security and compliance professionals focusing on AI security and cloud infrastructure.

Detailed Description:
The text primarily discusses Google Cloud’s significant presence at the NVIDIA GTC AI Conference and the announcement of A4 and A4X Virtual Machines (VMs) which are tailored for advanced AI workloads. Here are the major points that are noteworthy:

– **Event Participation**:
– Google Cloud’s prominent presence at the NVIDIA GTC AI Conference showcases their commitment to AI innovation and collaboration with NVIDIA.
– The conference gathers developers and industry leaders to explore AI’s potential in solving complex challenges.

– **A4 and A4X Virtual Machines**:
– **Capabilities**:
– A4X VMs are designed specifically for high-scale AI workloads, particularly those involving large language models (LLMs).
– These machines benefit from NVIDIA’s advanced GPU architecture, leveraging unified memory and ultra-low-latency GPU-to-GPU connectivity.
– **Performance**:
– A4X VMs can achieve impressive performance benchmarks, such as 860,000 tokens/sec for inference, demonstrating the powerful capability of these VMs to handle demanding AI tasks.
– A4 VMs enable significant upticks in training performance, improving AI model development efficiency.

– **AI Hypercomputer Architecture**:
– Google’s AI Hypercomputer combines workload-optimized hardware and software solutions to enhance the deployment of AI workloads effectively.
– Features such as the Titanium ML network adapter and high-throughput network fabric support high-performance computing, which is essential for both scalability and security in cloud environments.

– **Deployment and Scalability**:
– The solution offers a simplified deployment process through tools like Cluster Director, which ensures optimal resource allocation for AI tasks.
– Google Kubernetes Engine (GKE) is highlighted as the most scalable Kubernetes service for production-ready AI environments, essential for organizations looking to deploy robust AI applications.

– **Software and Framework Optimization**:
– Google Cloud collaborates with NVIDIA to optimize popular AI frameworks, enhancing developers’ capabilities to leverage the software in their AI initiatives efficiently.

– **Flexible Consumption Models**:
– Mention of innovative consumption models like the Dynamic Workload Scheduler underlines the need for adaptability in cloud services, thereby improving operational cost-effectiveness and resource accessibility for AI projects.

– **Collaborative Initiatives**:
– The partnership between NVIDIA and Google Cloud aims to advance AI applications in various fields, such as drug discovery, marking their commitment to innovative, real-world applications of AI technology.

This information is highly relevant for security and compliance professionals as it outlines advancements in AI infrastructure that not only improve the efficiency and scalability of AI cloud services but also highlight the importance of secure handling of vast datasets and AI models. It suggests a potential need for strengthened security measures and compliance frameworks to accommodate these evolving technologies in enterprise environments.