Cloud Blog: Introducing built-in performance monitoring for Vertex AI Model Garden

Mar 6, 2025

—

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/performance-monitoring-and-alerts-for-gen-ai-models-on-vertex-ai/
Source: Cloud Blog
Title: Introducing built-in performance monitoring for Vertex AI Model Garden

Feedly Summary: Today, we’re announcing built-in performance monitoring and alerts for Gemini and other managed foundation models – right from Vertex AI’s homepage.
Monitoring the performance of generative AI models is crucial when building lightning-fast, reliable, and scalable applications. But understanding the performance of these models has historically had a steep learning curve: in the past, you had to learn where the metrics were stored and where you could find them in the Cloud Console.
Now, these metrics are available right on Vertex AI’s home page, where you can easily find and understand the health of your models. Cloud Monitoring shows a built-in dashboard providing information about usage, latency, and error rates on your gen AI models. You can also quickly configure an alert if any requests have failed or been delayed.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

How it works
If you’re using Vertex AI foundation models, you can find overview metrics for your models on the Dashboard tab in Vertex AI, and click into an out-of-the-box dashboard in Cloud Monitoring to gain more information and customize the dashboard. Here, you will be better able to understand capacity constraints, predict costs, and troubleshoot errors. You can also quickly configure alerts to quickly inform you about failures and their causes.

View Model Observability in Vertex AI

Configure an alert

Let’s say you’re an SRE who is responsible for ensuring the uptime of your company’s new customer service chatbot. You want to find a dashboard that gives you a bird’s eye view of possible issues with the chatbot, whether they include slowness, errors, or unexpected usage volume. Instead of hunting for the right metrics and creating a dashboard that displays them, you can now go to the Vertex Dashboard page to view high level metrics, and click “Show all metrics” to view a detailed, opinionated dashboard with information about query rates, character and token throughput, latency, and errors.
Then, let’s say that you notice that your model returned a 429 error for a number of your requests. This happens when the ML serving region associated with your model runs out of aggregate capacity across customers. You can remediate the issue by purchasing provisioned throughput, switching ML processing locations, or scheduling non-urgent requests for a less busy time using batch requests. You can also quickly turn on a recommended alert that will let you know if more than 1% of your requests return 429 errors ever again.
Get started today
If you’re a user of managed gen AI models from Vertex AI Model Garden, check out the “Model Observability” tab in your project’s Vertex Dashboard page. Click “Show all metrics” to find the built-in dashboard. To configure recommended alerts related to your gen AI workloads, check out the Vertex AI Integration in Cloud Monitoring.

AI Summary and Description: Yes

Summary: The announcement introduces new built-in performance monitoring and alerting features for generative AI models managed through Google Cloud’s Vertex AI. It emphasizes the ease of access to performance metrics, which enhances the capability to troubleshoot and maintain model health, contributing significantly to AI application reliability and scalability.

Detailed Description: The text discusses the integration of performance monitoring tools within Google Cloud’s Vertex AI for generative AI models. This development is pivotal for professionals involved in AI, cloud computing, and security, as it allows for greater visibility and oversight of model performance, which is essential in maintaining operational effectiveness and troubleshooting potential issues.

– **Built-in Performance Monitoring**: The new features provide direct access to critical performance metrics from the Vertex AI homepage, enabling users to monitor the health of their AI models easily.
– **Metrics Insights**: Users can view important statistics such as usage rates, latency, error rates, and other relevant operational metrics without navigating away from the Vertex AI interface.
– **Real-time Alerts**: The capability to set alerts for failures or delays allows practitioners to quickly identify and respond to issues, enhancing the reliability of AI-powered applications.
– **User Experience**: The text mentions how past complexities of monitoring have been simplified, helping users who may not have extensive experience with metrics to understand and utilize the dashboard effectively.
– **Use Case Example**: An SRE responsible for a customer service chatbot is provided as a scenario to illustrate how these new features can aid in day-to-day operations—such as managing errors related to request capacity or processing locations.

Overall, this announcement marks a substantial improvement in the observability and management of generative AI models, aligning with industry trends towards optimized performance management and incident response in cloud-based AI solutions. Security and compliance professionals can leverage these insights to ensure robust operational standards are met in their AI deployments.

1 2 4 429 errors a access Act ads AGI AI AI integration ai model AI models AI workloads alerts alt and anti Application applications art as based board building by C capacity chat Chatbot CIA Cloud cloud computing cloud console Cloud Monitoring cloud-based compliance compliance professionals Computing Console cost Costs critical cross Customer customer service D dashboard day de deployment development e e-learning effective effectiveness end error error rate error rates errors exp experience face fail failures fast feature features for foundation model foundation models free g Gemini Gen generative Generative AI generative AI models Go Google Google Cloud H health high HP HR http HTTPS image in incident incident response industry industry trends information insights integration inter interface J k l latency learning led Li liability low mac machine man management media metrics mini ML Mode model model performance models Monitor monitoring monitoring tools N no non o observability of on one operation operational effectiveness Operational Metrics operational standards OPM opt out over oversight performance performance metrics performance monitoring play potential Power pre process processing product products professionals project provisioned throughput QUIC R rag rate RCE real real-time red Region reliability response responsible return right Ro s scalability scalable sec security security and compliance service side Sig Sim SoC solutions source SRE SSE standards start T text the throughput Time to token tool tools Tor TP trends trial troubleshooting UI up uptime US usage use user user experience Users V Vertex Vertex AI visibility Vision WAN Wi workload workloads x