These box office numbers do not include the cost of production or marketing costs. They also don't count the billions in merchandising that Disney has made over the last...Entertainmentread more
Stocks in Asia were set to trade lower on Monday, while investors await the launch of a Nasdaq-style technology board on the Shanghai Stock Exchange.Asia Marketsread more
Instagram began tests that hide "like" counts on posts. That means influencers who market products on Instagram will have to rely on different metrics to show success.Technologyread more
Peter Neupert worked for Microsoft and Amazon-backed Drugstore.com, where he got to know Jeff Bezos. He now advises start-ups.Technologyread more
Last week shows that oil prices are not the indicator for Middle East tensions they once were, and worries about global demand and growing U.S. production has changed that...Market Insiderread more
The firing of the tear gas was the latest confrontation between police and protesters who have taken to the streets for over a month to fight a proposed extradition bill and...China Politicsread more
Facebook Vice President David Marcus is the face of the company's Libra digital currency, but the original driving force was a 26-year-old female corporate-development...Technologyread more
Amazon's new policy for account suspensions doesn't go far enough to protect sellers from potentially unfair and wrongful suspensions, merchants say.Technologyread more
There is no end in sight to the Boeing 737 Max grounding after two fatal crashes, prompting airlines to rethink their growth plans.Airlinesread more
Gluskin Sheff's David Rosenberg is painting a painful picture for stocks as earnings season goes into full gear.Futures Nowread more
After a year of flooding, Midwest farmers face a stifling heat wave that's spreading across the U.S.Weather & Natural Disastersread more
Amazon failed to secure enough servers to handle the traffic surge on Prime Day, causing it to launch a scaled-down backup front page and temporarily kill off all international traffic, according to internal Amazon documents obtained by CNBC.
And that took place within 15 minutes of the start of Prime Day — one of Amazon's biggest sales days every year.
The e-commerce giant also had to add servers manually to meet the traffic demand, indicating its auto-scaling feature may have failed to work properly leading up to the crash, according to external experts who reviewed the documents. “Currently out of capacity for scaling,” one of the updates said about the status of Amazon’s servers, roughly an hour after Prime Day’s launch. “Looking at scavenging hardware.”
A breakdown in an internal system called Sable, which Amazon uses to provide computation and storage services to its retail and digital businesses, caused a series of glitches across other services that depend on it, including Prime, authentication and video playback, the documents show.
Other teams, including Alexa, Prime Now and Twitch, also reported problems, while some warehouses said they weren’t even able to scan products or pack orders for a period of time.
The documents give a rare look into how Amazon responded to the higher-than-expected traffic surge on Prime Day, which caused glitches across the site for hours. It also illustrates the difficulty Amazon faced in dealing with the demand, despite its deep experience running a massive-scale website and one of the largest cloud computing platforms in the world.
“More people came in than Amazon could handle,” Matthew Caesar, a computer science professor at the University of Illinois and co-founder of cybersecurity firm Veriflow, said after CNBC shared the details of the documents. “And Amazon couldn’t use all the resources they had available because there was a bug or some other issue with their software. "
Although the outage lasted for hours on Prime Day, the impact on overall sales was minimal. Amazon said it was the “biggest shopping event” in company history, with over 100 million products purchased by Prime members during the 36-hour event. Half a dozen sellers who spoke to CNBC also said they were happy with this year’s Prime Day sales, even after dealing with the downtime.
Amazon hasn’t said much publicly about the outage. It issued a single statement two hours after the site crash, succinctly saying “some customers are having difficulty shopping” and that it was working to “resolve the issue quickly.”
In an internal email seen by CNBC, Jeff Wilke, Amazon’s CEO of worldwide retail, noted that his team was “disappointed” about the site issues and said the company’s already working on ways to prevent this from happening again. Then he highlighted all the ways that Prime Day was a success.
“Tech teams are already working to improve our architecture, and I’m confident we’ll deliver an even better experience next year,” he wrote in the email.
Amazon declined to comment.
Amazon, based in Seattle, Washington, started seeing glitches across its site as soon as Prime Day launched at noon local time on Monday. In response, Amazon rushed to its backup plans and made quick changes during the first hour of the event.
Updates made at 12 p.m. say Amazon switched the front page to a simpler “fallback” page, as it saw a growing number of errors. Amazon’s front page on Prime Day looked oddly simple and rather poorly designed, noted Caesar, saying a simplified web page was likely put up to reduce load on their servers.
By 12:15 p.m., Amazon decided to temporarily cut off all international traffic to “reduce pressure” on its Sable system, and by 12:37 p.m., it reopened the default front page to only 25 percent of traffic. At 12:40 p.m., Amazon made certain changes that improved the performance of Sable, but just two minutes later, it went back to “consider” blocking approximately 5 percent of “unrecognized traffic to U.S.,” according to one of the documents.
Even after making these changes, Amazon’s site “error rate” continued to worsen until about 1:05 p.m., before drastically improving at 1:10 p.m., an internal site performance chart shows. Some parts of Amazon saw order rates that were “significantly higher than expected " by a factor of two, one of the updates said. One person familiar with the matter described the office scene as “chaotic” and said at one point more than 300 people tuned in to an emergency conference call.
“They are obviously scrambling, on short notice, to restore services,” said Henning Schulzrinne, a computer science professor at Columbia University and the former CTO of the Federal Communications Commission, after CNBC shared details of the documents. “These problems tend to feed on themselves — people retry loading, making the problem worse, or services complete partially. So shutting off services is often the better, but obviously bad, option.”
Amazon chose not to shut off its site. Instead, it manually added servers so it could improve the site performance gradually, according to the documents. One person wrote in a status update that he was adding 50 to 150 “hosts,” or virtual servers, because of the extra traffic.
Caesar says the root cause of the problem may have to do with a failure in Amazon’s auto-scaling feature, which automatically detects traffic fluctuations and adjusts server capacity accordingly. The fact that Amazon cut off international traffic first, rather than increase the number of servers immediately, and added server power manually instead of automatically, is an indication of a breakdown in auto-scaling, a critical component when dealing with unexpected traffic spikes, he said.
“If their auto-scaling was working, things would have scaled automatically and they wouldn't have had this level of outage,” Caesar said. “There was probably an implementation or configuration error in their automatic scaling systems.”
Due to the lack of server power, Amazon saw extra pressure on Sable, which is an internal storage and computational system that plays a critical role running multiple services across the site, according to documents seen by CNBC. Sable is used by 400 teams across Amazon and handled a total of 5.623 trillion service requests, or 63.5 million requests per second, during last year’s Prime Day, according to an internal document.
Sable was given a “red” emergency alert in one of the status updates, made a little past 1 p.m., which said it’s “running hot” and “cannot scale.” It also said other services, such as Prime, authentication and video playback, were being “impacted by Sable.”
“We are experiencing failures mostly related to Sable,” one of the updates said.
Carl Kesselman, a computer science professor at USC, said Amazon’s response to the outage was rather impressive because in many cases the site would have crashed entirely under those circumstances.
“Amazon is operating at a scale we haven’t operated before,” he said. “It’s not clear there’s a bad guy or an obvious screw-up. It’s just we’re in uncharted territory, and it’s amazing it didn’t just fall over.”
This year’s Prime Day was the first one run by Neil Lindsay, Amazon’s VP of worldwide marketing and Prime. Lindsay took over the Prime team after the former lead, Greg Greeley, left the company for Airbnb earlier this year.