The Cloudflare Blog: Cloudflare 1.1.1.1 Incident on July 14, 2025

Jul 16, 2025

—

Source URL: https://blog.cloudflare.com/cloudflare-1-1-1-1-incident-on-july-14-2025/
Source: The Cloudflare Blog
Title: Cloudflare 1.1.1.1 Incident on July 14, 2025

Feedly Summary: July 14th, 2025, Cloudflare made a change to our service topologies that caused an outage for 1.1.1.1 on the edge, causing downtime for 62 minutes for customers using the 1.1.1.1 public DNS Resolver.

AI Summary and Description: Yes

**Summary:**
The text discusses a global outage of Cloudflare’s 1.1.1.1 DNS Resolver service caused by a misconfiguration error during system updates. The incident highlights critical infrastructure vulnerabilities, emphasizes the importance of robust configuration management, and outlines steps for future remediation, serving as a cautionary tale for professionals in the domains of cloud computing, infrastructure security, and incident response.

**Detailed Description:**
The outage of Cloudflare’s 1.1.1.1 Resolver service illustrates the intricate relationship between configuration management and service continuity in cloud environments. Key points from the event detail the sequence of errors leading to widespread disruption and the subsequent actions taken to restore service:

– **Event Timeline:**
– On June 6, a configuration error was introduced without immediate impact, as it involved a DLS (Data Localization Suite) service not yet in production.
– On July 14, this dormant configuration error was activated when changes for the pre-production DLS service were implemented, leading to a global withdrawal of the 1.1.1.1 Resolver prefixes.
– Notably, this misconfiguration was not a result of external attacks, including BGP hijacking, although an unrelated BGP hijack event was observed concurrently.

– **Impact:**
– The outage significantly affected user access to various internet services, as the 1.1.1.1 Resolver is widely used.
– Traffic metrics showed a drastic drop in DNS queries across multiple protocols (UDP, TCP) while DNS-over-HTTPS (DoH) remained relatively unaffected due to different routing.

– **Technical Analysis:**
– Cloudflare’s infrastructure employs both legacy and modern systems to manage service topologies, highlighting the challenges of maintaining legacy configurations amidst evolving deployment methods.
– The lack of a progressive deployment methodology contributed to the error not being detected promptly, showcasing the need for better alert systems and configuration checking processes.

– **Restoration Efforts:**
– Service was partially restored by reverting to the previous configuration, and the reconfiguration of affected servers was expedited to mitigate service disruptions.
– The event underscored the need for robust risk management practices, including staged addressing deployments and expedited deprecation of legacy systems that hinder operational stability.

– **Remediation Steps:**
– Cloudflare is prioritizing the adoption of modern deployment approaches that facilitate gradual changes and better alert mechanisms.
– The emphasis on phasing out legacy systems aims to enhance reliability and compliance with current operational standards.

**Insights for Security and Compliance Professionals:**
This incident serves as a critical reminder for security and infrastructure professionals regarding the importance of maintaining rigorous configuration management protocols. By understanding the potential for human error in complex systems and the ramifications of outages, organizations can better prepare for unforeseen incidents, ensuring robust continuity and compliance with operational standards. The outlined remediation steps offer valuable practices that can be adopted across various domains to enhance resilience against similar future incidents.

1 2 2025 4 5 a access Act actions adoption AI alert system alt analysis and app art as at ated attack attacks being BGP Bi by C caution challenge challenges checking CI Cloud cloud computing cloud environment cloud environments Cloudflare co Col compliance compliance professionals Computing Configuration configuration error configuration management configurations core critical critical infrastructure cross Current Customer D data Data Localization de deployment deployment methods deployments Deprecation disruption DNS DNS resolver DoH domain domains downtime dual e edge environment error errors event exp External fixes for future g Global Go H high Highlight hijacking http HTTPS human human error in incident incident response infrastructure infrastructure professionals infrastructure security infrastructure vulnerabilities insights inter intern internet io Iron ite J jack k Key l leading led legacy systems Li liability local M made man management management practices management protocols media metrics mid Mila Misconfiguration Mode Modern multi N no o of off on operation operational stability operational standards opt organization organizations oS out outage outages over per point potential practices pre pro process processes product production professionals Progress prompt protocol protocols ps public Q queries R rate RCE red reliability remediation Resil resilience resolver response restore Risk risk management risk management practices Ro RoT routing s sec security security and compliance sequence server servers service service continuity service disruption service disruptions service topologies services Sig Sim size source SSE stability standards stored system System Updates systems T tech ted text the Time to Tor TP traffic Traffic Metrics UDP UI under up update updates US use user User Access V val vulnerabilities Wi x z