Cloud Blog: Networking support for AI workloads

Source URL: https://cloud.google.com/blog/products/networking/cross-cloud-network-solutions-support-for-ai-workloads/
Source: Cloud Blog
Title: Networking support for AI workloads

Feedly Summary: At Google Cloud, we strive to make it easy to deploy AI models onto our infrastructure. In this blog we explore how the Cross-Cloud Network solution supports your AI workloads.
Managed and Unmanaged AI options
Google Cloud provides both managed (Vertex AI) and do-it-yourself (DIY) approaches for running AI workloads. 

Vertex AI: A fully managed machine learning platform. Vertex AI offers both pre-trained Google models and access to third-party models through Model Garden. As a managed service, Vertex AI handles infrastructure management, allowing you to concentrate on training, tuning, and inferencing your AI models.

Custom infrastructure deployments: These deployments utilize various compute, storage and networking options based on the type of workload the user is running. AI Hypercomputer is one way to deploy both HPC workloads that may not require GPU and TPUs, and also AI workloads running TPUs or GPUs.

Networking for managed AI
With Vertex AI you don’t have to worry about the underlying infrastructure. For network connectivity by default the service is accessible via public API. Enterprises that want to use private connectivity have a choice of Private Service Access, Private Google Access, Private Service Connect endpoints and Private Service Connect for Google APIs. The option you choose will vary based on the specific Vertex AI service you are using. You can learn more in the Accessing Vertex AI from on-premises and multicloud documentation.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>

Networking AI infrastructure deployments
An organization has data located in another cloud, and would like to deploy an AI cluster with GPUs on Google Cloud. Let’s look at a sample case. 
Based on this need, you need to analyze the networking based on planning, data ingestion, training and inference.

Planning: This crucial initial phase involves defining your requirements, the size of the cluster (number of GPUs), the type of GPUs needed, the desired region and zone for deployment, storage and anticipated network bandwidth for transfers. This planning informs the subsequent steps. For instance, training large language models like LLaMA which has billions of parameters requires a significantly larger cluster than fine-tuning smaller models.

Data ingestion: Since the data is located in another cloud, you need a high-speed connection so that the data can be accessed directly or transferred to a storage option in Google Cloud. To facilitate this, Cross-Cloud Interconnect offers a direct connection at high bandwidth with a choice of 10Gbps or 100Gbps per link. Alternatively if the data is located on-premises, you can use Cloud Interconnect.

Training: Training workloads demand high-bandwidth, low-latency, and lossless cluster networking. You can achieve GPU-to-GPU communication that bypasses the system OS with Remote Direct Memory Access (RDMA). Google Cloud networking supports the RDMA over converged ethernet (RoCE) protocol in special network VPCs using the RDMA network profile. Proximity is important so nodes and clusters need to be as close to each other as possible for best performance.

Inference: Inference requires low-latency connectivity to endpoints, which can be exposed via connectivity options like Network Connectivity Center (NCC), Cloud VPN, VPC network peering and Private Services Connect.

In the example above we use:

Cross-Cloud Interconnect to connect to Google Cloud to meet the high speed connection requirement

RDMA networking with RoCE, since we want to optimize our accelerators and have planned requirements.

Google Kubernetes Engine (GKE) as a compute option on which to deploy our cluster.

Learn more
To learn more about networking for AI workloads please explore the following:

Cross-Cloud Network: Accelerating the Enterprise AI Journey with Cross-Cloud Network

Compute: Blackwell is here — new A4 VMs powered by NVIDIA B200 now in preview

Blog: New updates to AI Hypercomputer

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

AI Summary and Description: Yes

**Summary:** The text outlines how Google Cloud facilitates the deployment of AI models through its infrastructure, focusing particularly on the Cross-Cloud Network solution and various managed and unmanaged options for running AI workloads. Insights into set-up requirements and high-performance networking for AI workloads are particularly valuable for professionals in AI, cloud, and infrastructure security.

**Detailed Description:** The content is rich in details regarding Google Cloud’s offerings for deploying AI models, emphasizing both strategic planning and technical implementations. Key points include:

– **Managed vs. Unmanaged Options:**
– **Managed (Vertex AI):** A fully managed platform that simplifies the deployment of machine learning models by handling infrastructure management. It provides access to pre-trained models and third-party offerings.
– **Custom Infrastructure Deployments:** Involves defining compute, storage, and networking options tailored to specific workload requirements, ensuring flexibility for diverse AI applications.

– **Networking Considerations for Managed AI:**
– **Public and Private Connectivity:** Vertex AI provides accessible APIs as default, alongside options for private connectivity such as Private Service Access and Private Google Access.

– **Key Steps for Setting Up AI Workloads:**
– **Planning:** Critical first phase where organizations define cluster size, GPU requirements, geographic deployment specifics, and network bandwidth needs.
– **Data Ingestion:** Successful data transfer from other clouds or on-premises locations necessitates high-speed connections, highlighting the role of Cross-Cloud Interconnect for direct and high-bandwidth connectivity.
– **Training:** Requires high-bandwidth, low-latency networking; Google Cloud supports advanced networking techniques like Remote Direct Memory Access (RDMA) and RoCE for optimizing data transfer between GPUs.
– **Inference:** Emphasizes the need for low-latency connections to endpoints, detailing various connectivity options available within Google Cloud.

– **Use Case Example:**
– **Connecting to Google Cloud:** The text illustrates using Cross-Cloud Interconnect for speed and RDMA for optimized resource allocation when configuring a GPU cluster on Google Cloud.

– **Further Learning Opportunities:**
– Readers are encouraged to explore related resources on Cross-Cloud Networking, new computing options available, and recent updates to AI deployment technologies.

Overall, this content is significant for professionals engaged in AI and cloud services, providing insights into infrastructure setup, network optimization, and the integration of managed services for effective AI deployments.