Source URL: https://cloud.google.com/blog/products/data-analytics/bigquery-managed-disaster-recovery-adds-soft-failover/
Source: Cloud Blog
Title: Introducing BigQuery soft failover: Greater control for disaster recovery testing
Feedly Summary: Most businesses with mission-critical workloads have a two-fold disaster recovery solution in place that 1) replicates data to a secondary location, and 2) enables failover to that location in the event of an outage. For BigQuery, that solution takes the shape of BigQuery Managed Disaster Recovery. But the risk of data loss while testing a disaster recovery event remains a primary concern. Like traditional “hard failover" solutions, it forces a difficult choice: promote the secondary immediately and risk losing any data within the Recovery Point Objective (RPO), or delay recovery while you wait for a primary region that may never come back online.
Today, we’re addressing this directly with the introduction of soft failover in BigQuery Managed Disaster Recovery. Soft failover logic promotes the secondary region’s compute and datasets only after replication has been confirmed to be complete, providing you with full control over disaster recovery transitions, and minimizing the risk of data loss during a planned failover.
Figure 1: Comparing hard vs. soft failover
Summary of differences between hard failover and soft failover
Hard failover
Soft failover
Use case
Unplanned outages, region down
Failover testing, requires primary and secondary to both be available
Failover timing
As soon as possible ignoring any pending replication between primary and secondary; data loss possible
Subject to primary and secondary acquiescing, minimizing potential for data loss
RPO/RTO
15 minutes / 5 minutes*
N/A
*Supported objective depending on configuration
BigQuery soft failover in action
Imagine a large financial services company, "SecureBank," which uses BigQuery for its mission-critical analytics and reporting. SecureBank requires a reliable Recovery Time Objective (RTO) and15 minute Recovery Point Objective (RPO) for its primary BigQuery datasets, as robust disaster recovery is a top priority. They regularly conduct DR drills with BigQuery Managed DR to ensure compliance and readiness for unforeseen outages.
Before the introduction of soft failover in BigQuery Managed DR BigQuery, SecureBank faced a dilemma on how to perform their DR drills. While BigQuery Managed DR handled the failover of compute and associated datasets, conducting a full "hard failover" drill meant accepting the risk of up to 15 minutes of data loss if replication wasn’t complete when the failover was initiated — or significant operational disruption if they first manually verified data synchronization across regions. This often led to less realistic or more complex drills, consuming valuable engineering time and causing anxiety.
New solution:
With soft failover in BigQuery Managed DR, administrators have several options for failover procedures. Unlike hard failover for unplanned outages, soft failover initiates failover only after all data is replicated to the secondary region, to help guarantee data integrity.
Figure 2: Soft Failover Mode Selection
Figure 3: Disaster recovery reservations
Figure 4: Replication status / Failover details
BigQuery soft failover feature is available today via the BigQuery UI, DDL, and CLI, providing enterprise-grade control for disaster recovery, confident simulations, and compliance — without risking data loss during testing. Get started today to maintain uptime, prevent data loss, and test scenarios safely.
AI Summary and Description: Yes
Summary: The text discusses the introduction of soft failover in BigQuery Managed Disaster Recovery, providing a solution that enhances data integrity during disaster recovery processes. This innovation is particularly significant for businesses with mission-critical workloads, such as financial services, as it minimizes data loss while ensuring compliance and readiness for unforeseen outages.
Detailed Description: The content elaborates on two distinct approaches to disaster recovery—hard failover and the newly introduced soft failover—highlighting the importance of data protection and operational efficiency in disaster recovery scenarios.
– **Disaster Recovery Solutions**:
– Businesses typically implement a two-fold disaster recovery solution:
1. Data replication to a secondary location.
2. Failover capability to that secondary location during outages.
– **BigQuery Managed Disaster Recovery**:
– A specific solution for managing disaster recovery within Google’s BigQuery, designed to facilitate the resilience of mission-critical applications.
– **Challenges with Hard Failover**:
– Hard failover necessitates immediate failover action, which can lead to potential data loss if replication between primary and secondary locations is incomplete at the time of failover.
– Causes operational disruptions, making routine recovery drills difficult and complex.
– **Introduction of Soft Failover**:
– Promotes the secondary region’s compute and datasets only after confirming the completion of data replication, which allows greater control over disaster recovery transitions.
– Enhances reliability for companies like SecureBank, reducing the risks associated with data loss during recovery drills and ensuring compliance with their Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
– **Case Study of SecureBank**:
– Emphasizes the critical nature of robust disaster recovery in the financial services sector.
– Illustrates how soft failover provides flexibility and assurance for compliance rehearsals, reducing the anxiety of potential data loss and operational disruption.
– **Conclusion**:
– The soft failover feature in BigQuery Managed DR offers advanced capabilities for disaster recovery control, aligning with enterprise needs to maintain uptime and ensure data integrity during testing scenarios.
– Available via multiple interfaces (UI, DDL, CLI), facilitating ease of access and usability for organizations.
This development in disaster recovery solutions plays a significant role in the broader fields of cloud computing and information security, particularly for professionals aiming to ensure robust data protection strategies in enterprises relying on cloud infrastructure.