On October 19-20, 2025, a race condition in DynamoDB’s automated DNS management system caused a 15-hour cascading outage across AWS’s US-EAST-1 region — the largest and most widely used AWS region.
What Happened
AWS uses two automated systems to manage DNS records for DynamoDB’s load balancers:
- DNS Planner — Monitors load balancer health and creates DNS update plans
- DNS Enactor — Applies those plans to Route 53
The failure sequence:
- Enactor A picked up an outdated DNS plan but was delayed before applying it.
- Enactor B applied a newer, correct plan and started cleaning up stale records.
- While cleanup was running, Enactor A finally woke up and applied its outdated plan — writing an empty DNS record for
dynamodb.us-east-1.amazonaws.com. - The automation could not self-repair. Route 53 returned NXDOMAIN for DynamoDB.
The Cascade
Since virtually every AWS service depends on DynamoDB, the empty DNS record cascaded into outages across EC2, Lambda, ECS, load balancers, CloudWatch, and dozens of other services. AWS subsequently disabled the automated DNS Planner and Enactor worldwide.
Lessons for DNS Operators
- DNS is the single point of failure — even for the largest cloud platforms. Using a dedicated DNS hosting provider alongside your cloud provider adds a critical layer of redundancy.
- Automated DNS changes need safeguards — race conditions in automation can be catastrophic.
- Monitor your DNS records — external DNS monitoring would have caught the empty record immediately.
- Test DNS failover — assume your DNS will break, and plan for it.
Sources: ThousandEyes Analysis, InfoQ Postmortem, The Register