Cloudflare recently experienced a significant global outage on November 18, 2025, due to a self-inflicted misconfiguration issue. The outage affected a large portion of the Internet, causing HTTP 5xx errors for users accessing websites, APIs, and applications running on Cloudflare’s network. The root cause was traced back to a database permissions update that led to a cascading failure in the company’s Bot Management system and core proxy layer.
The misconfiguration in a ClickHouse database cluster resulted in the generation of duplicate entries in a Bot Management feature file, exceeding memory limits and triggering widespread failures in Cloudflare’s routing software. This led to oscillating network states before collapsing into a persistent failure mode, impacting core proxy infrastructure and various services like Turnstile, Workers KV, and Access.
In response to the outage, Cloudflare has implemented systemic safeguards to prevent similar incidents in the future. These include enhanced validation of configuration files, global kill switches, resilient error-handling mechanisms, and measures to prevent system overload during high-failure events. The incident underscores the importance of resilience engineering and configuration hygiene in maintaining the reliability of global cloud networks.