Source URL: https://cloud.google.com/blog/topics/developers-practitioners/how-to-build-highly-available-multi-regional-services-with-cloud-run/
Source: Cloud Blog
Title: How to Build Highly Available Multi-regional Services with Cloud Run
Feedly Summary: Ever worry about your applications going down just when you need them most? The talk at Cloud Next 2025, Run high-availability multi-region services with Cloud Run, dives deep into building fault tolerant and reliable applications using Google Cloud’s serverless container platform: Cloud Run.
Google experts Shane Ouchi and Taylor Money, along with Seenuvasan Devasenan from Commerzbank, pull back the curtain on Cloud Run’s built-in resilience and walk you through a real-world scenario with the upcoming Cloud Run feature called Service Health.
Understanding Cloud Run’s Built-In Fault Tolerance
For the Cloud Next 2025 presentation, Shane kicked things off by discussing the baseline resilience of Cloud Run through autoscaling, a decoupled data and control plane, and N+1 zonal redundancy. Let’s break that down, starting with autoscaling.
Autoscaling to Make Sure Capacity Meets Demand
Cloud Run automatically adds and removes instances based on the incoming request load, ensuring that the capacity of a Cloud Run service meets the demand. Shane calls this hyper-elasticity, referring to Cloud Run’s ability to rapidly add container instances. Rapid autoscaling prevents the failure mode where your application doesn’t have enough server instances to handle all requests.
Note: Cloud Run lets you prevent runaway scaling by limiting the maximum number of instances.
A Decoupled Data and Control Planes Increases Resiliency
The control plane in Cloud Run is the part of the system responsible for management operations, such as deploying new revisions, configuring services, and managing infrastructure resources. It’s decoupled from the data plane. The data plane is responsible for receiving incoming user requests, routing them to container instances, and executing the application code. Because the data plane operates independently from the control plane, issues in the control plane typically don’t impact running services.
N+1 Redundancy for Both Control and Data Plane
Cloud Run is a regional service, and Cloud Run provides N+1 zonal redundancy by default. That means if any of the zones in a region experiences failures, the Cloud Run infrastructure has sufficient failover capacity (that’s the “+1”) in the same region to continue serving all workloads. This isolates your application from zone failures.
Container Probes Increase Availability
If you’re concerned with application availability, you should definitely configure liveness probes to make sure failing instances are shut down. You can configure two distinct types of container instance health checks on Cloud Run.
Startup probe: Confirms that a new instance has successfully started and is ready to receive requests
Liveness probe: Monitors if a running instance remains healthy and able to continue processing requests. This probe is optional, but enabling it allows Cloud Run to automatically remove faulty instances
100% Availability is Unrealistic
Some applications are so important that you want them to always be available. While 100% availability is unrealistic, you can make them as fault tolerant as possible. Getting that right depends on your application architecture and on the underlying platforms and services you use. Cloud Run has several features that increase its baseline resilience, but there’s more you can do to make your application more resilient.
Going Beyond Zonal Redundancy
Since Cloud Run is a regional service, providing zonal redundancy, developers have to actively architect their application to be resilient against regional outages. Fortunately, Cloud Run already supports multi-regional deployments. Here’s how that works:
Deploy a Cloud Run service to multiple regions, each using the same container image and configuration.
Create a global external application load balancer, with one backend and a Serverless Network Endpoint Group (NEG) per Cloud Run service.
Use a single entrypoint with one global external IP address.
Here’s how that looks like in a diagram:
In case you’re not familiar, a Serverless Network Endpoint Group (NEG) is a load balancer backend configuration resource that points to a Cloud Run service or an App Engine app.
Architecting Applications for Regional Redundancy Can Be Challenging
While deploying in multiple regions is straightforward with Cloud Run, the challenge lies in architecting your application in such a way that individual regional services can fail without losing data or impacting services in other regions.
Data redundancy and replication are hard to get right. Although, it’s arguably not as hard as it used to be now that multi-regional databases such as Cloud Spanner and the cross-regional replica feature in Cloud SQL exist.
In this post, I won’t dive in deeper, but I recommend reading this excellent paper by Anna Berenberg and Brad Calder: Deployment Archetypes for Cloud Applications.
A Preview of Service Health for Automated Regional Failover
If you set up a multi-regional Cloud Run architecture today, requests are always routed to the region closest to them, but they are not automatically routed away if a Cloud Run service becomes unavailable, as shown in the following illustration:
The upcoming feature Service Health adds automatic traffic failover of traffic from one region to another if a service in one region becomes unavailable:
Enabling Service Health
As of August 2025, Service Health is not yet publicly available (it’s in private preview), but I’m hopeful that’ll change soon. One thing to keep in mind is that the feature might still change until it’s generally available. You can sign up to get access by filling in this request form.
Once you have access, you can enable Service Health on a multi-regional service in two steps:
Add a container instance readiness probe to each Cloud Run service.
Set minimum instances to 1 on each Cloud Run service.
That’s really all there is to it. No additional load balancer configuration is required.
Readiness Probes Are Coming to Cloud Run
As part of Service Health, readiness probes are introduced to Cloud Run. A readiness probe periodically checks each container instance via HTTP. If a readiness probe fails, Cloud Run stops routing traffic to that instance until the probe succeeds again. In contrast, a failing liveness probe causes Cloud Run to shut down the unhealthy instance.
Service Health uses the aggregate readiness state of all container instances in a service to determine if the service itself is healthy or not. If a large percentage of the containers is failing, it marks the service as unhealthy and routes traffic to a different region.
A Live Demo at Cloud Next 2025
In a live demo, Taylor deployed the same service to two regions (one near, one far away). He then sent a request via a Global External Application Load Balancer (ALB). The ALB correctly routed the request to the service in the closest region.
After configuring the closest service to flip between failing and healthy every 30 seconds, he demonstrated that the traffic didn’t failover. That’s the current behavior – so far nothing new.
The next step in his demo was enabling Service Health through enabling minimum instances and a readiness probe on each service. For deploying the config changes to the two services, Taylor used a new flag in the Cloud Run gcloud interface: the –regions flag in gcloud run deploy. It’s a great way to deploy the same container image and configuration to multiple regions at the same time.
With the readiness probes in place and minimum instances set, Service Health started detecting service failure and moved over the traffic to the healthy service in the other region. I thought that was a great demo!
Next Steps
In this post, you learned about Cloud Run’s built-in fault tolerance mechanisms, such as autoscaling and zonal redundancy, how to architect multi-region services for higher availability, and got a preview of the upcoming Service Health feature for automated regional failover.
While the Service Health feature is still in private preview, you can sign up to get access by filling in this request form.
AI Summary and Description: Yes
Summary: The text discusses the features and benefits of Google Cloud’s Cloud Run, particularly focusing on its built-in fault tolerance and resilience capabilities for building high-availability applications. Key points include autoscaling, the decoupling of control and data planes, and the upcoming Service Health feature that provides automated regional failover. This information is highly relevant to professionals in cloud infrastructure and security.
Detailed Description: The article outlines essential aspects of Google Cloud’s Cloud Run service, which is designed to increase application availability and resilience through several key features:
– **Autoscaling**:
– Automatically adjusts instance capacity based on incoming request load, ensuring applications can handle varying demand (termed “hyper-elasticity”).
– Administrators can set limits to prevent excessive scaling.
– **Decoupled Data and Control Planes**:
– The control plane manages operations like deployments without affecting data handling, which enhances application resilience during control plane issues.
– **N+1 Zonal Redundancy**:
– By default, Cloud Run offers N+1 redundancy, ensuring failover capacity within the same region to maintain service availability during zone failures.
– **Container Probes for Availability**:
– **Startup probes** confirm the readiness of new instances.
– **Liveness probes** monitor ongoing instance health, enabling automatic removal of faulty instances.
– **Architecting for Multi-Regional Resilience**:
– Encourages deploying services across multiple regions to prevent downtime during regional outages.
– Introduces concepts like global external application load balancers and Serverless Network Endpoint Groups.
– **Upcoming Service Health Feature**:
– In private preview as of August 2025, this feature will automate traffic failover to healthy services in different regions when a service becomes unavailable.
– Requires the configuration of readiness probes and setting minimum instance counts.
– **Actionable Steps for Implementation**:
– To enable Service Health effectively involves adding readiness probes and configuring Cloud Run settings.
– **Live Demonstration Insights**:
– A practical demonstration at Cloud Next 2025 illustrated the current behavior of request routing and how the Service Health feature improves upon it.
– **Next Steps for Professionals**:
– The article concludes by encouraging professionals to consider architecture decisions that enhance fault tolerance and to explore the new Service Health feature when available.
This text is significant for security and infrastructure professionals as it highlights Google Cloud’s approach to enhancing application reliability and resilience, which directly impacts deployment strategies in critical environments.