Understanding the Difference Between Data and Control Plane in AWS
When it comes to AWS incidents, the focus is often on the control plane, which manages how traffic is directed, rather than the data plane. The data plane is responsible for executing these instructions by delivering DNS queries to their intended destinations.
According to HFS Research’s associate practice leader, Akshat Tyagi, in major AWS incidents, the DNS data plane remains operational while the control plane in US East may experience issues. This can hinder the ability to update DNS quickly to reroute traffic, highlighting a crucial failure point.
The introduction of a new feature aims to address this gap by providing a resilient, multi-region control path that ensures key APIs like ‘ChangeResourceRecordSets’ remain accessible within a guaranteed 60-minute recovery window. This enables enterprises to redirect users to backup regions, switch to standby endpoints, or implement disaster recovery measures without relying on AWS for recovery.
The Architectural Challenges of the US East Region for AWS
The US East (Northern Virginia) region has long been a significant bottleneck for AWS from an architectural standpoint.
According to Tyagi, many global AWS services have historically depended on the Northern Virginia region for their control plane. Any disruptions in this region can have widespread effects across AWS services, impacting users worldwide.
While the new feature addresses one critical gap, Tyagi cautions that it may not be sufficient to prevent future outages entirely. Despite this, it represents a step towards enhancing the resilience of AWS services and minimizing the impact of potential disruptions in the US East region.