Cloud Blog: Accelerate your AI workloads with the Google Cloud Managed Lustre

Source URL: https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-managed-lustre-for-ai-hpc/
Source: Cloud Blog
Title: Accelerate your AI workloads with the Google Cloud Managed Lustre

Feedly Summary: Today, we’re making it even easier to achieve breakthrough performance for your AI/ML workloads: Google Cloud Managed Lustre is now GA, and available in four distinct performance tiers that deliver throughput ranging from 125 MB/s, 250 MB/s, 500 MB/s, to 1000 MB/s per TiB of capacity — with the ability to scale up to 8 PB of storage capacity. The Managed Lustre solution is powered by DDN’s EXAScaler, combining DDN’s decades of leadership in high-performance storage with Google Cloud’s expertise in cloud infrastructure.
Managed Lustre provides a POSIX-compliant, parallel file system that delivers consistently high throughput and low latency, essential for:

High-throughput inference: For applications that require near-real-time inference on large datasets, Lustre provides high parallel throughput and sub-millisecond read latency.

Large-scale model training: Accelerate the training cycles of deep learning models by providing rapid access to petabytes-sized datasets. Lustre’s parallel architecture ensures GPUs and TPUs are fed with data, minimizing idle time.

Checkpointing and restarting large models: Save and restore the state of large models during training faster, improving goodput and allowing for more efficient experimentation.

Data preprocessing and feature engineering: Process raw data, extract features, and prepare datasets for training, reducing the time spent on data pipelines.

Scientific simulations and research: Beyond AI/ML, Lustre excels in traditional HPC scenarios like computational fluid dynamics, genomic sequencing, and climate modeling, where massive datasets and high-concurrency access are critical.

Lustre is designed for the highly parallel and random I/O that characterizes many AI/ML training and inference tasks. This parallel processing capability across multiple clients ensures your compute resources are never starved for data.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Performance tiers and pricing
Managed Lustre offers flexible pricing and performance tiers designed to meet the diverse needs of your workloads, whether you’re focused on capacity or highest throughput density. 

Throughput MB/s per TiB of storage capacity

Storage pricing per GiB per month

125

$0.145

250

$0.21

500

$0.34

1000

$0.60

Please see more details at the Managed Lustre pricing page.
Irrespective of the aggregate throughput, all tiers come with sub-millisecond read latency, high single-stream throughput, and are perfect for parallel access to many small files.
Driving innovation together: partnering with DDN
Google Cloud’s Managed Lustre is powered by DDN’s EXAScaler, bringing together two industry leaders in high-performance computing and elastic cloud infrastructure. This partnership represents a joint commitment to simplifying the deployment and management of large-scale AI and HPC workloads in the cloud, thanks to:

Trusted leaders: By combining DDN’s decades of expertise in high-performance Lustre with Google Cloud’s global infrastructure and AI ecosystem, we are delivering a foundational capability that removes storage bottlenecks and helps our customers solve their most complex challenges in AI and HPC.

Fully managed and supported solution: Enjoy the benefits of a fully managed service from Google, with comprehensive support from both Google and DDN, for seamless operations and peace of mind.

Global availability and ecosystem integration: Managed Lustre is now globally accessible in multiple Google Cloud regions and integrates with the broader Google Cloud ecosystem, including Google Kubernetes Engine (GKE) and TPUs.

These benefits caught the attention of one of our largest partners, NVIDIA, who is looking forward to having it as part of its NVIDIA AI platform. 
“Enterprises today demand AI infrastructure that combines accelerated computing with high-performance storage solutions to deliver uncompromising speed, seamless scalability and cost efficiency at scale. Google and DDN’s collaboration on Google Cloud Managed Lustre creates a better-together solution uniquely suited to meet these needs. By integrating DDN’s enterprise-grade data platforms and Google’s global cloud capabilities, organizations can readily access vast amounts of data and unlock the full potential of AI with the NVIDIA AI platform (or NVIDIA accelerated computing platform) on Google Cloud — reducing time-to-insight, maximizing GPU utilization, and lowering total cost of ownership.” – Dave Salvator, Director of Accelerated Computing Products, NVIDIA
Get started today!
Ready to supercharge your AI/ML and HPC workloads? Getting started with Managed Lustre is simple:

Navigate to Managed Lustre in the Google Cloud console.

Provision your Managed Lustre instance, choosing the performance tier and size that best fits your needs.

Connect your compute instances, GKE clusters to your new high-performance file system.

For detailed instructions and documentation, please visit the Managed Lustre documentation. And if needed, reach out to Google Cloud sales specialists.
Watch the Fireside Chat
Don’t miss the opportunity to learn more about the strategic partnership between Google Cloud and DDN, and the unique capabilities of Managed Lustre. Read the official DDN press release here.
Watch the fireside chat with Sameet Agarwal, VP/GM Storage and Sven Oehme, CTO of DDN, here.

AI Summary and Description: Yes

Summary: Google Cloud Managed Lustre, now generally available, offers enhanced storage solutions tailored for AI and ML workloads with four performance tiers allowing scalability and consistent high throughput. This integration with DDN’s technology aims to meet the demands of large-scale data processing for applications like deep learning and HPC.

Detailed Description: Google Cloud has launched its Managed Lustre offering, aimed specifically at improving performance for AI and machine learning (ML) workloads. This new solution is significant for professionals in AI, cloud infrastructure, and high-performance computing (HPC) due to its ability to enhance data throughput and reduce latency.

* **Key Features:**
– **Performance Tiers:** Four distinct tiers (125 MB/s, 250 MB/s, 500 MB/s, 1000 MB/s) provide flexibility for varying workload requirements, supporting storage capacities up to 8 PB.
– **High Throughput Inference:** Enables near-real-time inference on large datasets through parallel throughput and sub-millisecond read latency.
– **Large-Scale Model Training:** Accelerates training cycles of deep learning models by ensuring rapid data access, minimizing GPU and TPU idle time.
– **Checkpointing and Model Restart:** Enhances the efficiency of saving and restoring model states during training, allowing for more streamlined experimentation.
– **Data Preprocessing:** Facilitates quick processing and feature extraction, shortening data pipeline durations.
– **Support for Scientific Research:** Lustre also caters to traditional HPC demands, excelling in areas like computational fluid dynamics and climate modeling.

* **Architecture and Design:**
– Managed Lustre is built for the high parallel and random I/O often seen in AI training and inference tasks, ensuring consistent data delivery to computing resources.

* **Partnership and Support:**
– The solution is powered by DDN’s EXAScaler, combining DDN’s expertise with Google Cloud’s infrastructure capabilities.
– Offers a fully managed and supported service, easing operations and providing peace of mind for users.

* **Ecosystem Integration:**
– Managed Lustre is integrated with various Google Cloud services such as Google Kubernetes Engine (GKE) and TPUs, enhancing its applicability within existing workflows.

* **Market Response:**
– Major partners, including NVIDIA, are excited about this feature, which promises to address enterprise demands for fast and scalable AI infrastructure.

Professionals in AI and cloud security should pay attention to how Managed Lustre could influence their data handling strategies, particularly in environments where performance and efficiency are critical.