Cloud Blog: An SRE’s guide to optimizing ML systems with MLOps pipelines

Source URL: https://cloud.google.com/blog/products/devops-sre/applying-sre-principles-to-your-mlops-pipelines/
Source: Cloud Blog
Title: An SRE’s guide to optimizing ML systems with MLOps pipelines

Feedly Summary: Picture this: you’re an Site Reliability Engineer (SRE) responsible for the systems that power your company’s machine learning (ML) services. What do you do to ensure you have a reliable ML service, how do you know you’re doing it well, and how can you build strong systems to support these services?
As artificial intelligence (AI) becomes more widely available, its features — including ML — will matter more to SREs. That’s because ML becomes both a part of the infrastructure used in production software systems, as well as an important feature of the software itself.
Abstractly, machine learning relies on its pipelines … and you know how to manage those! So you can begin with pipeline management, then look to other factors that will strengthen your ML services: training, model freshness, and efficiency. In the resources below, we’ll look at some of the ML-specific characteristics of these pipelines that you’ll want to consider in your operations. Then, we draw on the experience of Google SREs to show you how to apply your core SRE skills to operating and managing your organization’s machine-learning pipelines.
Training ML models
Training ML models applies the notion of pipelines to specific types of data, often running on specialized hardware. Critical aspects to consider about the pipeline:

how much data you’re ingesting

how fresh this data needs to be

how the system trains and deploys the models

how efficiently the system handles these first three things

SREcon22 Europe/Middle East/Africa – SRE and ML: Why It Matters

This keynote presents an SRE perspective on the value of applying reliability principles to the components of machine learning systems. It provides insight into why ML systems matter for products, and how SREs should think about them. The challenges that ML systems present include capacity planning, resource management, and monitoring; other challenges include understanding the cost of ML systems as part of your overall operations environment.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

ML freshness and data volume
As with any pipeline-based system, a big part of understanding the system is describing how much data it typically ingests and processes. The Data Processing Pipelines chapter in the SRE Workbook lays out the fundamentals: automate the pipeline’s operation so that it is resilient, and can operate unattended.
You’ll want to develop Service Level Objectives (SLOs) in order to measure the pipeline’s health, especially for data freshness, i.e., how recently the model got the data it’s using to produce an inference for a customer. Understanding freshness provides an important measure of an ML system’s health, as data that becomes stale may lead to lower-quality inferences and sub-optimal outcomes for the user. For some systems, such as weather forecasting, data may need to be very fresh (just minutes or seconds old); for other systems, such as spell-checkers, data freshness can lag on the order of days — or longer! Freshness requirements will vary by product, so it’s important that you know what you’re building and how the audience expects to use it.
In this way, freshness is a part of the critical user journey described in the SRE Workbook, describing one aspect of the customer experience. You can read more about data freshness as a component of pipeline systems in the Google SRE article Reliable Data Processing with Minimal Toil.
There’s more than freshness to ensuring high-quality data — there’s also how you define the model-training pipeline. A Brief Guide To Running ML Systems in Production gives you the nuts and bolts of this discipline, from using contextual metrics to understand freshness and throughput, to methods for understanding the quality of your input data.
Serving efficiency
The 2021 SRE blog post Efficient Machine Learning Inference provides a valuable resource to learn about improving your model’s performance in a production environment. (And remember, training is never the same as production for ML services!)
Optimizing machine learning inference serving is crucial for real-world deployment. In this article, the authors explore multi-model serving off of a shared VM. They cover realistic use cases and how to manage trade-offs between cost, utilization, and latency of model responses. By changing the allocation of models to VMs, and varying the size and shape of those VMs in terms of processing, GPU, and RAM attached, you can improve the cost effectiveness of model serving.
Cost efficiency
We mentioned that these AI pipelines often rely on specialized hardware. How do you know you’re using this hardware efficiently? Todd Underwood’s talk from SREcon EMEA 2023 on Artificial Intelligence: What Will It Cost You? gives you a sense of how much this specialized hardware costs to run, and how you can provide incentives for using it efficiently.
Automation for scale
This article from Google’s SRE team outlines strategies for ensuring reliable data processing while minimizing manual effort, or toil. One of the key takeaways: use an existing, standard platform for as much of the pipeline as possible. After all, your business goals should focus on innovations in presenting the data and the ML model, not in the pipeline itself. The article covers automation, monitoring, and incident response, with a focus on using these concepts to build resilient data pipelines. You’ll read best practices for designing data systems that can handle failures gracefully and reduce a team’s operational burden. This article is essential reading for anyone involved in data engineering or operations. Read more about toil in the SRE Workbook: https://sre.google/workbook/eliminating-toil/.
Next steps
Successful ML deployments require careful management and monitoring for systems to be reliable and sustainable. That means taking a holistic approach, including implementing data pipelines, training pathways, model management, and validation, alongside monitoring and accuracy metrics. To go deeper, check out this guide on how to use GKE for your AI orchestration.

AI Summary and Description: Yes

**Summary:** The text discusses the role of Site Reliability Engineers (SREs) in managing machine learning (ML) services, emphasizing the importance of pipeline management, data freshness, resource efficiency, and cost-effectiveness. It highlights the need for SREs to integrate ML principles into their operations to ensure reliable and efficient systems, with insights drawn from the practices of Google SREs.

**Detailed Description:**

The content is highly relevant for professionals in AI, cloud, and infrastructure security, as it addresses the critical intersection of ML services and reliability engineering. It underscores the need for a structured approach to managing ML systems, which is vital for achieving operational excellence in tech environments increasingly influenced by AI technologies.

Key Points:

– **Role of SREs**:
– SREs are integral to the reliability of ML services, impacting both infrastructure and software elements.

– **Pipeline Management**:
– Emphasizes the significance of ML pipelines, which require the ingestion, processing, and analysis of data.
– Important factors to manage include:
– Volume of data being ingested.
– The recency (freshness) of data to maintain model accuracy.
– Efficient training and deployment of models.

– **Training ML Models**:
– Discusses specific aspects of training ML models, including:
– Monitoring data ingestion rates.
– Evaluating the freshness of data as a metric for model performance.

– **Challenges in ML Systems**:
– Resource management and capacity planning are highlighted as major challenges.
– The operational cost of ML systems must also be understood as part of resource allocation discussions.

– **Freshness and Quality of Data**:
– Stresses the importance of data freshness in producing accurate model predictions.
– Different applications have varying freshness requirements, impacting user experience.

– **Serving Efficiency**:
– Identifies methods for optimizing ML inference serving, crucial for performance and cost management in production environments.
– Suggests practical adjustments in resource allocation for improving cost-effectiveness.

– **Cost Efficiency and Specialized Hardware**:
– Discusses the financial aspects of running specialized hardware for ML pipelines, critical for long-term operational sustainability.

– **Automation Strategies**:
– Encourages the use of automation to minimize manual toil and to enhance the reliability of data processing workflows.
– Advocates for adopting existing, standard platforms to streamline pipeline operations.

– **Next Steps**:
– To ensure successful ML deployments, there needs to be a comprehensive strategy that covers pipeline implementation, model management, validation, and continuous monitoring.

Moreover, the text suggests further reading and resources, including guides on utilizing Google Kubernetes Engine for AI orchestration, ensuring that readers can delve deeper into the methodologies discussed.

In conclusion, the comprehensive management of ML systems as presented requires a confluence of skills from various operational domains, making the insights valuable not just for SREs but for all professionals involved in cloud computing, AI, and infrastructure security.