Amazon server failures darken 'cloud' outlook
Amazon.com's Internet-based computing services, used by thousands of business customers to run their Web pages and store data, experienced technical problems Thursday.
The Associated Press
NEW YORK — Major websites including Foursquare and Reddit crashed or suffered slowdowns Thursday after technical problems rattled Amazon.com's widely used Web servers, frustrating millions of people who couldn't access their favorite sites.
Though better-known for selling books, DVDs and other consumer goods, Amazon also rents out space on huge computer servers that run many websites and other online services.
The problems began at an Amazon data center near Dulles Airport outside Washington, D.C., and persisted into the afternoon. The failures were widespread, but they varied in severity.
HootSuite, which lets users monitor Twitter and other social networks more easily, was down completely, as was questions-and-answers site Quora.
The location-sharing social network Foursquare experienced glitches, while the news-sharing site Reddit was in "emergency read-only mode."
Many other companies that use Amazon Web Services, like Netflix and Zynga, which runs Facebook games, appeared to be unscathed. Amazon has at least one other major data center that stayed up, in California.
It's not uncommon for Internet services to become inaccessible due to technical problems, sometimes for hours or even days. But Thursday's outages were notable because Amazon's servers are so commonly used, meaning many sites went down at once.
Amazon did not respond to requests for comment. It has not revealed how many companies use its Web services or how many were affected by the outage.
No one knew for sure how many people were inconvenienced, but the services affected are used by millions.
Amazon Web Services (AWS) provide "cloud" or utility-style computing in which customers pay only for the computing power and storage they need, on remote computers.
Seattle-based Amazon has big plans for AWS. Although it now makes up just a small percent of the company's revenue, CEO Jeff Bezos said last year that it could eventually be as large as Amazon's retail business. Competitors include Rackspace Hosting and Microsoft's Azure platform.
Some people consider cloud computing more reliable than conventional hosting services in which a small company might rent a handful of computers in a data center.
If one of them malfunctions, the failure can take down a website. But "clouds" like AWS use vast banks of computers. If one fails, the tasks that it performs, such as running a website or a game, can immediately be taken over by others.
When a company needs more capacity, maybe because of a surge in visitors to its website, it only takes minutes to rent more computers from Amazon.
But cloud computing isn't immune to failure, either.
Lydia Leong, an analyst for the tech research firm Gartner, said that judging by details posted on Amazon's AWS status page, a network connection failed Thursday morning, triggering an automatic recovery mechanism that then also failed.
Amazon's computers are divided into groups that are supposed to be independent of each other. If one group fails, others should stay up. And customers are encouraged to spread the computers they rent over several groups to ensure reliable service. But Thursday's problem took out many groups simultaneously.
In general, Amazon Web Services have been more reliable and, above all, cheaper than many other hosting systems, said Josh Cochrane, vice president of product development at Palo Alto Software in Eugene, Ore.
But the firm's websites and Web-based applications that create business plans were all brought down by Thursday's crash.
"It's a pretty vulnerable feeling," he said. "This is a really big message to us that we need to revisit our strategy."
That might include spreading the applications more widely over Amazon's network, so that problems at one data center won't bring down everything, he said.
The outage is evidence that companies can't wholly rely on cloud services to handle important functions, Vanessa Alvarez, an analyst at Forrester Research, told Bloomberg News.
"Customers need to start asking tough questions and not assume everything will be taken care of in the cloud, because it will not," Alvarez said. "They shouldn't be counting on a cloud service provider like Amazon to provide disaster recovery."
Amazon engineers struggled throughout the day to rectify the problem. Leong said the problems are of a type that's not covered by Amazon's money-back guarantees.