The Cloudflare Blog: Cloudflare incident on August 21, 2025

Aug 22, 2025

—

Source URL: https://blog.cloudflare.com/cloudflare-incident-on-august-21-2025/
Source: The Cloudflare Blog
Title: Cloudflare incident on August 21, 2025

Feedly Summary: On August 21, 2025, an influx of traffic directed toward clients hosted in AWS us-east-1 caused severe congestion on links between Cloudflare and us-east-1. In this post, we explain the details.

AI Summary and Description: Yes

Summary: The incident detailed in the text highlights a network congestion issue between Cloudflare and AWS us-east-1 caused by a sudden surge of requests from a single customer. This incident underscores the importance of robust network management and architecture to prevent such disturbances in cloud infrastructure.

Detailed Description:

The text describes a significant congestion event that occurred on August 21, 2025, impacting customers connected to Cloudflare via AWS us-east-1. Here are the major points encapsulated in this incident:

– **Incident Overview**:
– A surge in traffic from one customer caused severe congestion between Cloudflare and AWS, impacting users with high latency, packet loss, and connection failures.
– The congestion commenced at 16:27 UTC and was alleviated by 19:38 UTC.
– The issue was not an attack but was due to excessive legitimate traffic.

– **Causes of Congestion**:
– The network was saturated due to excessive requests from a single client, leading to inadequate capacity on some peering links between Cloudflare and AWS.
– AWS’s decision to withdraw certain BGP advertisements exacerbated the issue by rerouting traffic, which then also became overloaded.

– **Response Actions**:
– Cloudflare’s incident team worked closely with AWS to manage the surge and restore normal service.
– Rate limiting was employed to decrease congestion, along with additional engineering actions to mitigate the issues.

– **Timeline of Events**:
– The incident timeline provides granular detail on the sequence of events, including when traffic surged, when AWS began withdrawing BGP prefixes, and the subsequent response actions taken by Cloudflare and AWS.

– **Remediations and Future Actions**:
– The incident prompted Cloudflare to develop strategies for better isolation of customer traffic to prevent one user’s spikes from affecting others.
– Planned actions include enhancing network capacity through Data Center Interconnect upgrades, developing deprioritization mechanisms for congesting traffic, and a long-term strategy for better traffic management to ensure fair resource allocation.

– **Implications for Security and Compliance Professionals**:
– The incident illustrates the critical need for robust network management practices in cloud environments to maintain service quality.
– It highlights the potential risks associated with single points of congestion, particularly in distributed cloud infrastructures, which could lead to compliance issues if customers experience poor service levels.

In conclusion, this case serves as a valuable lesson in the significance of architecture such as redundancy, capacity management, and proactive traffic management in cloud computing environments, thereby informing security and compliance strategies to better manage unforeseen customer behaviors and ensure the stability of cloud services.

1 2 2025 3 5 7 a Act actions Advertisements age AI air All and Arch architecture art as at ated attack AWS Behavior BGP Bi by C capacity capacity management CI CIA client clients Cloud cloud computing cloud computing environments cloud environment cloud environments cloud infrastructure cloud service cloud services Cloudflare co compliance compliance issues compliance professionals compliance strategies Computing computing environments core critical custom Customer customer behavior D data data center de decision distributed cloud e Engineer engineering environment event exp experience fail failures fixes for future g git grade H high Highlight hosted HR http HTTPS impact implications implications for security in incident infrastructure infrastructures inter io Iron isolation issue ite J k l latency leading led level Li limiting Link load long M man management management practices media N NCA network network capacity network congestion network management no o of on one ons oS oss other out over per point post potential potential risks practices pre prioritization pro proactive professionals prompt ps Q quality R rate rate limiting RCE re red redundancy remediation resource resource allocation response restore Risk risks Ro routing s sec security security and compliance sequence service services Sig single SoC source SSO stability strategies Strategy structures T Tails team ted text the Time to Tor TP traffic Traffic Management two under up upgrade upgrades US use user Users V val vents Wi x z