How a Single DNS Race Condition Brought AWS to Its Knees

March 13, 2026 by Jonas Lejon

On October 19-20, 2025, a race condition in DynamoDB’s automated DNS management system caused a 15-hour cascading outage across AWS’s US-EAST-1 region — the largest and most widely used AWS region.

What Happened

AWS uses two automated systems to manage DNS records for DynamoDB’s load balancers:

DNS Planner — Monitors load balancer health and creates DNS update plans
DNS Enactor — Applies those plans to Route 53

The failure sequence:

Enactor A picked up an outdated DNS plan but was delayed before applying it.
Enactor B applied a newer, correct plan and started cleaning up stale records.
While cleanup was running, Enactor A finally woke up and applied its outdated plan — writing an empty DNS record for dynamodb.us-east-1.amazonaws.com.
The automation could not self-repair. Route 53 returned NXDOMAIN for DynamoDB.

The Cascade

Since virtually every AWS service depends on DynamoDB, the empty DNS record cascaded into outages across EC2, Lambda, ECS, load balancers, CloudWatch, and dozens of other services. AWS subsequently disabled the automated DNS Planner and Enactor worldwide.

Lessons for DNS Operators

DNS is the single point of failure — even for the largest cloud platforms. Using a dedicated DNS hosting provider alongside your cloud provider adds a critical layer of redundancy.
Automated DNS changes need safeguards — race conditions in automation can be catastrophic.
Monitor your DNS records — external DNS monitoring would have caught the empty record immediately.
Test DNS failover — assume your DNS will break, and plan for it.

Sources: ThousandEyes Analysis, InfoQ Postmortem, The Register

Share on Facebook Share on Twitter

How a Single DNS Race Condition Brought AWS to Its Knees

What Happened

The Cascade

Lessons for DNS Operators

Get our latest content in your inbox

Join over 2,000 satisfied customers