Cloud Blog: How to Build Highly Available Multi-regional Services with Cloud Run

Sep 4, 2025

—

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/how-to-build-highly-available-multi-regional-services-with-cloud-run/
Source: Cloud Blog
Title: How to Build Highly Available Multi-regional Services with Cloud Run

Feedly Summary: Ever worry about your applications going down just when you need them most? The talk at Cloud Next 2025, Run high-availability multi-region services with Cloud Run, dives deep into building fault tolerant and reliable applications using Google Cloud’s serverless container platform: Cloud Run.
Google experts Shane Ouchi and Taylor Money, along with Seenuvasan Devasenan from Commerzbank, pull back the curtain on Cloud Run’s built-in resilience and walk you through a real-world scenario with the upcoming Cloud Run feature called Service Health.
Understanding Cloud Run’s Built-In Fault Tolerance
For the Cloud Next 2025 presentation, Shane kicked things off by discussing the baseline resilience of Cloud Run through autoscaling, a decoupled data and control plane, and N+1 zonal redundancy. Let’s break that down, starting with autoscaling.
Autoscaling to Make Sure Capacity Meets Demand
Cloud Run automatically adds and removes instances based on the incoming request load, ensuring that the capacity of a Cloud Run service meets the demand. Shane calls this hyper-elasticity, referring to Cloud Run’s ability to rapidly add container instances. Rapid autoscaling prevents the failure mode where your application doesn’t have enough server instances to handle all requests.
Note: Cloud Run lets you prevent runaway scaling by limiting the maximum number of instances.
A Decoupled Data and Control Planes Increases Resiliency
The control plane in Cloud Run is the part of the system responsible for management operations, such as deploying new revisions, configuring services, and managing infrastructure resources. It’s decoupled from the data plane. The data plane is responsible for receiving incoming user requests, routing them to container instances, and executing the application code. Because the data plane operates independently from the control plane, issues in the control plane typically don’t impact running services.
N+1 Redundancy for Both Control and Data Plane
Cloud Run is a regional service, and Cloud Run provides N+1 zonal redundancy by default. That means if any of the zones in a region experiences failures, the Cloud Run infrastructure has sufficient failover capacity (that’s the “+1”) in the same region to continue serving all workloads. This isolates your application from zone failures.
Container Probes Increase Availability
If you’re concerned with application availability, you should definitely configure liveness probes to make sure failing instances are shut down. You can configure two distinct types of container instance health checks on Cloud Run.

Startup probe: Confirms that a new instance has successfully started and is ready to receive requests

Liveness probe: Monitors if a running instance remains healthy and able to continue processing requests. This probe is optional, but enabling it allows Cloud Run to automatically remove faulty instances

100% Availability is Unrealistic
Some applications are so important that you want them to always be available. While 100% availability is unrealistic, you can make them as fault tolerant as possible. Getting that right depends on your application architecture and on the underlying platforms and services you use. Cloud Run has several features that increase its baseline resilience, but there’s more you can do to make your application more resilient.
Going Beyond Zonal Redundancy
Since Cloud Run is a regional service, providing zonal redundancy, developers have to actively architect their application to be resilient against regional outages. Fortunately, Cloud Run already supports multi-regional deployments. Here’s how that works:

Deploy a Cloud Run service to multiple regions, each using the same container image and configuration.

Create a global external application load balancer, with one backend and a Serverless Network Endpoint Group (NEG) per Cloud Run service.

Use a single entrypoint with one global external IP address.

Here’s how that looks like in a diagram:

In case you’re not familiar, a Serverless Network Endpoint Group (NEG) is a load balancer backend configuration resource that points to a Cloud Run service or an App Engine app.
Architecting Applications for Regional Redundancy Can Be Challenging
While deploying in multiple regions is straightforward with Cloud Run, the challenge lies in architecting your application in such a way that individual regional services can fail without losing data or impacting services in other regions.
Data redundancy and replication are hard to get right. Although, it’s arguably not as hard as it used to be now that multi-regional databases such as Cloud Spanner and the cross-regional replica feature in Cloud SQL exist.
In this post, I won’t dive in deeper, but I recommend reading this excellent paper by Anna Berenberg and Brad Calder: Deployment Archetypes for Cloud Applications.
A Preview of Service Health for Automated Regional Failover
If you set up a multi-regional Cloud Run architecture today, requests are always routed to the region closest to them, but they are not automatically routed away if a Cloud Run service becomes unavailable, as shown in the following illustration:

The upcoming feature Service Health adds automatic traffic failover of traffic from one region to another if a service in one region becomes unavailable:

Enabling Service Health
As of August 2025, Service Health is not yet publicly available (it’s in private preview), but I’m hopeful that’ll change soon. One thing to keep in mind is that the feature might still change until it’s generally available. You can sign up to get access by filling in this request form.
Once you have access, you can enable Service Health on a multi-regional service in two steps:

Add a container instance readiness probe to each Cloud Run service.

Set minimum instances to 1 on each Cloud Run service.

That’s really all there is to it. No additional load balancer configuration is required.
Readiness Probes Are Coming to Cloud Run
As part of Service Health, readiness probes are introduced to Cloud Run. A readiness probe periodically checks each container instance via HTTP. If a readiness probe fails, Cloud Run stops routing traffic to that instance until the probe succeeds again. In contrast, a failing liveness probe causes Cloud Run to shut down the unhealthy instance.
Service Health uses the aggregate readiness state of all container instances in a service to determine if the service itself is healthy or not. If a large percentage of the containers is failing, it marks the service as unhealthy and routes traffic to a different region.
A Live Demo at Cloud Next 2025
In a live demo, Taylor deployed the same service to two regions (one near, one far away). He then sent a request via a Global External Application Load Balancer (ALB). The ALB correctly routed the request to the service in the closest region.
After configuring the closest service to flip between failing and healthy every 30 seconds, he demonstrated that the traffic didn’t failover. That’s the current behavior – so far nothing new.
The next step in his demo was enabling Service Health through enabling minimum instances and a readiness probe on each service. For deploying the config changes to the two services, Taylor used a new flag in the Cloud Run gcloud interface: the –regions flag in gcloud run deploy. It’s a great way to deploy the same container image and configuration to multiple regions at the same time.
With the readiness probes in place and minimum instances set, Service Health started detecting service failure and moved over the traffic to the healthy service in the other region. I thought that was a great demo!
Next Steps
In this post, you learned about Cloud Run’s built-in fault tolerance mechanisms, such as autoscaling and zonal redundancy, how to architect multi-region services for higher availability, and got a preview of the upcoming Service Health feature for automated regional failover.
While the Service Health feature is still in private preview, you can sign up to get access by filling in this request form.

AI Summary and Description: Yes

Summary: The text discusses the features and benefits of Google Cloud’s Cloud Run, particularly focusing on its built-in fault tolerance and resilience capabilities for building high-availability applications. Key points include autoscaling, the decoupling of control and data planes, and the upcoming Service Health feature that provides automated regional failover. This information is highly relevant to professionals in cloud infrastructure and security.

Detailed Description: The article outlines essential aspects of Google Cloud’s Cloud Run service, which is designed to increase application availability and resilience through several key features:

– **Autoscaling**:
– Automatically adjusts instance capacity based on incoming request load, ensuring applications can handle varying demand (termed “hyper-elasticity”).
– Administrators can set limits to prevent excessive scaling.

– **Decoupled Data and Control Planes**:
– The control plane manages operations like deployments without affecting data handling, which enhances application resilience during control plane issues.

– **N+1 Zonal Redundancy**:
– By default, Cloud Run offers N+1 redundancy, ensuring failover capacity within the same region to maintain service availability during zone failures.

– **Container Probes for Availability**:
– **Startup probes** confirm the readiness of new instances.
– **Liveness probes** monitor ongoing instance health, enabling automatic removal of faulty instances.

– **Architecting for Multi-Regional Resilience**:
– Encourages deploying services across multiple regions to prevent downtime during regional outages.
– Introduces concepts like global external application load balancers and Serverless Network Endpoint Groups.

– **Upcoming Service Health Feature**:
– In private preview as of August 2025, this feature will automate traffic failover to healthy services in different regions when a service becomes unavailable.
– Requires the configuration of readiness probes and setting minimum instance counts.

– **Actionable Steps for Implementation**:
– To enable Service Health effectively involves adding readiness probes and configuring Cloud Run settings.

– **Live Demonstration Insights**:
– A practical demonstration at Cloud Next 2025 illustrated the current behavior of request routing and how the Service Health feature improves upon it.

– **Next Steps for Professionals**:
– The article concludes by encouraging professionals to consider architecture decisions that enhance fault tolerance and to explore the new Service Health feature when available.

This text is significant for security and infrastructure professionals as it highlights Google Cloud’s approach to enhancing application reliability and resilience, which directly impacts deployment strategies in critical environments.

1 10 2 2025 3 5 a access Act administrators ads after age AGI AI All alt and API app Application application architecture Application Load Balancer Application Load Balancers application resilience applications Arch architecture art as at ated Auto autoscaling availability backend based Behavior benefits beyond Bi bot building built by C capabilities capacity cell CERN challenge CI Cloud cloud applications cloud infrastructure Cloud Next Cloud Run Cloud SQL co code Commerzbank concept Configuration container container image container platform container probes containers control control plane coupling critical cross Current D data Data Handling database databases day de decision decisions deep default DeFi demand demo deployment deployment strategies deployments design developer developers downtime dual e effective elasticity ELF end endpoint Entry environment environments event Excel exp experience expert Experts External face fail failover failures fault fault tolerance feature features following for full g Gen general glob Global Go Google Google Cloud Group gs H handling health health checks high Highlight HR http HTTPS Hyper image impact implementation in information infrastructure infrastructure professionals insights Instance inter interface io Iron issue ite J Just k Key l Lance large led Li liability limiting line load load balancer Load Balancers long low M man management max mean mini Mode Monitor multi Multi-Region N N+1 redundancy network network endpoint new next NGO NIST no nothing o oE of off on on experience one ons operation operations ops opt oS oss other out outage outages over paper per platform platforms point post pre Preview pro process processing professionals ps public Q R rag rate RCE re readiness readiness probes reading ready real red redundancy Region regional redundancy regional services Regions reliability replication Resil resilience resiliency resource resources responsible review revision right Ro routing s sam scaling sec security self server Server Instances serverless service service availability Service Health services settings SHA side Sig single source Spanner sql SSE STAR start startup state strategies support system T ted text the Thought Time to tolerant Tor TP traffic two type UI under up ups US use user uv V val vents Vision WAN Wi workload workloads world x z zone