Cloud Blog: Toward faster incident resolution at Palo Alto Networks with Personalized Service Health

Source URL: https://cloud.google.com/blog/products/management-tools/personalized-service-health-at-palo-alto-networks/
Source: Cloud Blog
Title: Toward faster incident resolution at Palo Alto Networks with Personalized Service Health

Feedly Summary: Cloud incidents happen. And when they do, it’s incumbent on the cloud service provider to communicate about the incident to impacted customers quickly and effectively — and for the cloud service consumer to use that information effectively, as part of a larger incident management response. 
Google Cloud Personalized Service Health provides businesses with fast, transparent, relevant, and actionable communication about Google Cloud service disruptions, tailored to a specific business at its desired level of granularity. Cybersecurity company Palo Alto Networks is one Google Cloud customer and partner that recently integrated Personalized Service Health signals into the incident workflow for its Google Cloud-based PRISMA Access offering, saving its customers critical minutes during active incidents. 
By programmatically ingesting Personalized Service Health signals into advanced workflow components, Palo Alto can quickly make decisions such as triggering contingency actions to protect business continuity.
Let’s take a closer look at how Palo Alto integrated Personalized Service Health into its operations.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

The Personalized Service Health integration
Palo Alto ingests Personalized Service Health logs into its internal AIOps system, which centralizes incident communications for PRISMA Access and applies advanced techniques to classify and distribute signals to the people responsible for responding to a given incident.

Personalized Service Health UI Incident list view

Users of Personalized Service Health can filter what relevance levels they want to see. Here, “Partially related” reflects an issue anywhere in the world with the products that are used. “Related” reflects that the problem is detected within the data center regions, while “Impacted” means that Google has verified the impact to the customer for specific services.
While Google is still confirming an incident, Personalized Service Health communicates some of these incidents as ‘PSH Emerging Incident’ to provide customers with early notification. Once Google confirms the incident, these incidents are merged with ‘PSH Confirmed Incidents’. This helps customers respond faster to a specific incident that’s impacting their environment or escalate back to Google, if needed. 
Personalized Service Health distributes updates throughout an active incident, typically every 30 minutes, or sooner if there’s progress to share. These updates are also written to logs, which Palo Alto ingests into AIOps.
Responding to disruptive, unplanned cloud service provider incidents can be accelerated by programmatically ingesting and distributing incident communications. This is especially true in large-scale organizations such as Palo Alto, which has multiple teams involved in incident response for different applications, workloads and customers. 
Fueling the incident lifecycle
Palo Alto further leverages the ingested Personalized Service Health signals in its AIOps platform, which uses machine learning (ML) and analytics to automate IT operations. AIOps harnesses big data from operational appliances to detect and respond to issues instantaneously.  AIOps correlates these signals with internally generated alerts to declare an incident that is affecting multiple customers. These AIOps alerts are tied to other incident management tools that assist with managing the incident lifecycle, including communication, regular updates and incident resolution.

In addition, a data enrichment pipeline takes Personalized Service Health incidents, adds Palo Alto’s related information, and publishes the events to Pub/Sub. AIOps then consumes the incident data from Pub/Sub, processes it, correlates it to related events signals, and notifies subscribed channels.
Palo Alto organizes Google Cloud assets into folders within the Google Cloud console. Each project represents a Palo Alto PRISMA Access customer. To receive incident signals that are likewise specific to end customers, Palo Alto creates a log sink that’s specific to each folder, aggregating service health logs at the folder level. Palo Alto then receives incident signals specific to each customer so it can take further action.

Palo Alto drives the following actions based on incident communications flowing from Google Cloud:

Proactive detection of zonal, inter-regional, external en-masse failures

Accurately identifying workloads affected by cloud provider incidents 

Correlation of product issue caused by cloud service degradation in Google Cloud Platform itself

Seeing Personalized Service Health’s value
Incidents caused by cloud providers often go unnoticed or are difficult to isolate without involving multiple of the cloud provider’s teams (support, engineering, SRE, account management). The Personalized Service Health alerting framework plus AIOps correlation engine allows Palo Alto’s SRE teams to isolate issues caused by a cloud provider near-instantaneously.

Palo Alto’s incident management workflow is designed to address mass failures versus individual customer outages, ensuring the right teams are engaged until the incidents are resolved. This includes notifying relevant parties, such as the on-call engineer and the Google Cloud support team. With Personalized Service Health, Palo Alto can capture both event types i.e., mass failures as well as individual customer outages.
Palo Alto gets value from Personalized Service Health in multiple ways, beginning with faster incident response and contingency actions with which to optimize business continuity, especially for impacted customers of PRISMA Access. In the event of an incident impacting them, Prisma Access customers naturally seek and expect information from Palo Alto. By ensuring this information flows rapidly from Google Cloud to Palo Alto’s incident response systems, Palo Alto is able to provide more insightful answers to these end customers, and plans to serve additional Palo Alto use cases based on both existing and future Personalized Service Health capabilities. 
Take your incident management to the next level
Google Cloud is continually evolving Personalized Service Health to provide deeper value for all Google Cloud customers — from startups, to ISVs and SaaS providers, to the largest enterprises. Ready to get started? Learn more about Personalized Service Health, or reach out to your account team.

AI Summary and Description: Yes

Summary: The text discusses the integration of Google Cloud’s Personalized Service Health by Palo Alto Networks to enhance incident management during cloud service disruptions. This integration enables rapid communication and decision-making to maintain business continuity, making it particularly relevant for professionals in cloud computing and incident response.

Detailed Description:
The provided text outlines how Palo Alto Networks has effectively utilized Google Cloud’s Personalized Service Health to improve incident management and response during cloud service outages. This innovative integration showcases modern approaches to cybersecurity and cloud operational efficiency.

Key Points:
– **Personalized Service Health**: A tool that provides transparent, relevant, and actionable communications regarding Google Cloud service disruptions.
– **Integration with AIOps**: Palo Alto has integrated Personalized Service Health signals into their AIOps platform, enhancing incident response by programmatically ingesting service health logs.
– **Automation and Machine Learning**: The AIOps system detects, correlates, and responds to incidents instantly using machine learning and analytics, streamlining the incident lifecycle.
– **Incident Classification**: Signals are categorized based on relevance, enabling quick identification of issues affecting customers.
– **Proactive Responses**: By using these signals, Palo Alto can proactively detect failures, identify workloads affected, and correlate product issues with cloud service failures.
– **Incident Alerts and Communication**: The integration allows for continuous updates about incidents, generally at 30-minute intervals, which supports rapid decision-making during incidents.
– **Value for Customers**: Through faster incident resolutions and more informed communications, Palo Alto enhances business continuity for its customers using PRISMA Access.

The text highlights the critical intersection of cloud computing, incident management, and AIOps, emphasizing the growing importance of effective communication and incident response systems in maintaining cloud service reliability. For security and compliance professionals, such advancements illustrate the need for robust incident management frameworks that leverage technology effectively to ensure enduring business operations.