Amazon failed to secure enough servers to handle the traffic surge on Prime Day, causing it to launch a scaled-down backup front page and temporarily kill off all international traffic, according to internal Amazon documents obtained by CNBC.
And that took place within 15 minutes of the start of Prime Day — one of Amazon's biggest sales days every year.
The e-commerce giant also had to add servers manually to meet the traffic demand, indicating its auto-scaling feature may have failed to work properly leading up to the crash, according to external experts who reviewed the documents. “Currently out of capacity for scaling,” one of the updates said about the status of Amazon’s servers, roughly an hour after Prime Day’s launch. “Looking at scavenging hardware.”
A breakdown in an internal system called Sable, which Amazon uses to provide computation and storage services to its retail and digital businesses, caused a series of glitches across other services that depend on it, including Prime, authentication and video playback, the documents show.
Other teams, including Alexa, Prime Now and Twitch, also reported problems, while some warehouses said they weren’t even able to scan products or pack orders for a period of time.
The documents give a rare look into how Amazon responded to the higher-than-expected traffic surge on Prime Day, which caused glitchesacross the site for hours. It also illustrates the difficulty Amazon faced in dealing with the demand, despite its deep experience running a massive-scale website and one of the largest cloud computing platforms in the world.
“More people came in than Amazon could handle,” Matthew Caesar, a computer science professor at the University of Illinois and co-founder of cybersecurity firm Veriflow, said after CNBC shared the details of the documents. “And Amazon couldn’t use all the resources they had available because there was a bug or some other issue with their software."
Although the outage lasted for hours on Prime Day, the impact on overall sales was minimal. Amazon said it was the “biggest shopping event” in company history, with over 100 million products purchased by Prime members during the 36-hour event. Half a dozen sellers who spoke to CNBC also said they were happy with this year’s Prime Day sales, even after dealing with the downtime.
Amazon hasn’t said much publicly about the outage. It issued a single statement two hours after the site crash, succinctly saying “some customers are having difficulty shopping” and that it was working to “resolve the issue quickly.”
In an internal email seen by CNBC, Jeff Wilke, Amazon’s CEO of worldwide retail, noted that his team was “disappointed” about the site issues and said the company’s already working on ways to prevent this from happening again. Then he highlighted all the ways that Prime Day was a success.
“Tech teams are already working to improve our architecture, and I’m confident we’ll deliver an even better experience next year,” he wrote in the email.
Amazon declined to comment.