EC2 Outage Shows How Much the Net Relies On Amazon

← Back to Stories (view on slashdot.org)

EC2 Outage Shows How Much the Net Relies On Amazon

Posted by Soulskill on Saturday April 23, 2011 @06:33AM from the too-big-to-fail dept.

An anonymous reader writes "Much has been written about the recent EC2/EBS outage, but Keir Thomas at PC World has a different take: it's shown how much cutting-edge Internet infrastructure relies on Amazon, and we should be grateful. Quoting: 'Amazon is a personification of the spirit of the Internet, which is one of true democracy, access to the means of distribution, and rapid evolution.'" An article at O'Reilly comes to a similarly positive conclusion from a different angle.

11 of 147 comments (clear)

Min score:

Reason:

Sort:

Clouds: Up in the air and foggy: by Hartree · 2011-04-23 06:40 · Score: 5, Insightful

This article seems to be an apology for Amazon.
Basicly it says "We went down, and took down lots of important stuff. That shows just how important we are and that lots of people use us. Thus, our cloud is a good thing."
The logic of that doesn't quite work.
I agree that it's a useful tool, but there are a lot of things that don't make sense to put in the cloud.
Except they didn't work. by pavon · 2011-04-23 06:46 · Score: 4, Informative

A large number of people that are experiencing this outage, did pay for multiple availability zones, and it didn't help them.
1. Re:Except they didn't work. by el_tedward · 2011-04-23 07:06 · Score: 5, Informative
  
  I guess what we should learn from this is to put your failover in separate regions, not separate availability zones?
2. Re:Except they didn't work. by WrongSizeGlass · 2011-04-23 07:09 · Score: 5, Informative
  
  From the NYT article:
  
  Big companies, that have decided to put crucial operations on Amazon computers are apt to pay up for the equivalent of computing insurance, analysts say. Netflix, the movie rental site, has become a large customer of the Amazon cloud. Most of its Web technology — customer movie queues, search tools and the like — runs in Amazon data centers.
  
  Netflix said it had sailed through the last couple of days unscathed. “That’s because Netflix has taken full advantage of Amazon Web Services’ redundant cloud architecture,” which insures against technical malfunctions in any one location, said Steve Swasey, a Netflix spokesman.
  Sounds like it worked for some.
3. Re:Except they didn't work. by Guspaz · 2011-04-23 07:15 · Score: 4, Insightful
  
  Paying for multiple availability zones is not the same as paying for multiple locations. There are multiple availability zones in a single datacenter. Netflix got it right, they spread their infrastructure over multiple physical locations, and didn't suffer any downtime despite losing a significant chunk of their infrastructure; it was business as usual.
  Like anything else, cloud computing still requires you to decide how much redundancy you're willing to pay for. If uptime is that important to you, spreading your infrastructure out over multiple datacenters is a no-brainer.
Re:Why The Cloud? by mini+me · 2011-04-23 07:06 · Score: 4, Insightful

The cloud represents a black box that abstracts the underlying network topology.
You might send your data to a server in Germany and retrieve it from a server in the USA. When you put something in the cloud you do not have to worry about problems like this because the cloud provider already has a hot backup ready to take the slack in another part of the world. You don't need to know or care how it happens, it just works. S3 is an Amazon example of a cloud service. You send your file to S3 and Amazon takes the responsibility of ensuring that it is available even if a datacenter is blown to smithereens.
EC2 and EBS are not the cloud. There is no abstraction of the datacenter. Amazon leaves it up to you to choose which datacenter you wish to work in. This can allow you to easily build a cloud application on top of their physical infrastructure, but it is up to you to make it "the cloud". We witnessed so many failures because the applications were not cloud applications, just standard hosted services.
Where have I heard this before... by girlintraining · 2011-04-23 07:15 · Score: 4, Insightful

Microsoft: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: BURN THE HERETIC! Amazon: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: It's okay. Here, have a cookie.

--
#fuckbeta #iamslashdot #dicemustdie
Re:Why The Cloud? by tragedy · 2011-04-23 07:37 · Score: 4, Informative

Hmm, considering how long "the cloud" has been a buzzword, doesn't it seem like an awful lot of unscheduled downtime if there have been enough events already for people to be claiming that they aren't given a fair shake by the media when they go down. After all, if the media have reported on it several times, it's happened several times. That's more unscheduled downtime than your typical web server gets in a few years.
Perhaps if they hadn't gone with a word that means fuzzy, insubstantial and ephemeral to describe their services people wouldn't have the same reservations about it. Maybe it's also because IT people don't like their managers to say "I just heard about this neat new thing, let's abandon the system we have now to pursue this" against their advice, then have to deal with being screamed at by their managers later when everything is down and there's absolutely nothing they can do about it because they've effectively ceded all control to a third party service provider who has not managed, thus far, to establish themselves as particularly safe or reliable.
The apologists whose articles are linked in this Slashdot story seem to think it's great that we're putting all of our eggs into the baskets of known basket droppers. Thus far I'm not impressed enough by these providers. Obviously, in order to do anything on the Internet, you have to rely on some sort of service provider, and even they have to rely on their peers. So obviously there's no way you can have total control. Nevertheless, you should still try to retain all the control you can over your own stuff.
Made it Through Pretty Much Unscathed by ShipIt · 2011-04-23 07:42 · Score: 5, Informative

Totally concur with others pointing out Amazon offers redundancy if you choose to use it.

We had webservers, database (master/slave,) and other services split across usa-east and usa-west.

When usa-east started showing problems, we:
*) Took the usa-east webservers out of round robin DNS (ttl 1hr)
*) Verified the slave (in usa-west) was up to date, shut down the master (usa-east,) and converted the slave to master.
*) Updated all webservers to point to the new master.
*) Cranked up new usa-west webservers / updated round robin DNS

I believe Amazon offers mechanisms to do this automatically or we could just always write our own failover scripts, but this is the tradeoff me made. We were willing to trade some service degradation by switching over manually in exchange for avoiding the pitfalls of false-positive detection. Very much an application specific tradeoff, not for everyone, but it worked for what we are doing.

The key was to avoid putting all eggs in the usa-east basket and splitting up across usa-west, even though we incur additional bandwidth fees, ie master/slave replication transfer is full fee between regions.

We were never concerned about cascading failures effecting multiple availability zones in a give region nor did it matter for us - our redundancy requirement was geographical diversity, not partitions within a datacenter. We were thinking natural disaster, but the architecture covered us in this case as well.

The coolest thing to me is just how quickly we were able to shuffle around these resources to avoid a problem area - a couple of hours. There's no way we could have done it so quickly with what we had before - a combination of our own colocated servers and VPS.
Re:Why The Cloud? by emt377 · 2011-04-23 09:04 · Score: 4, Informative

Why is so much in the cloud? I've heard it touted in lots of marketing speak, but I've never worked with it.
As someone who has never worked with the cloud (shocking, I know), what are the advantages and disadvantages?
Is it basically just distributed scalable redundant web hosting run by a big company? So you're basically renting to avoid the start-up capital costs of those services and to put them in the hands of specialists, while you focus on your web apps?
Or is it more?
There's a big mix-up of lots of different concepts and ideas here, to the point that the questions you ask are impossible to answer.
- EC2 is a vps-like virtual server provisioning service. You rent a virtual server instance by the hour. APIs exist for you to dynamically add and remove instances as needed. You create an image, then can fire up additional instances as you see fit. Someone like Netflix for instance, can fire up streaming servers during peak hours then shut them down at off hours.
- You can of course set up your own co-lo systems, but it will be provisioned 24/7 and will cost you more since it will be sized for peak capacity, and even during peak most of the servers will be idle much of the time due to random load variance. You can improve peak utilization by setting up your own virtual provisioning. But then you have ops costs, so unless you have a massive operational scale you'll find it cheaper to buy from AWS (or linode, rackspace, etc).
- EBS is a logical volume service. You create a volume and mount it on an EC2 instance. Like with server instances, there are API calls to dynamically create EBS volumes. You can unmount it and move it to a different server in the same datacenter, so you could use them for instance to take backup snapshots or log analysis, or similar, in addition to simply being server storage. Of course you get to build or buy the software to do all these things yourself.
- Server instances belong to groups, and have access controls set up among them. This allows you to create private 'backplane' interconnects, where some things like sql servers are only accessible to instances part of a group.
- EIPs are elastic IPs, which are IPs you lease and can then assign to any of your server instances (usually ingress and point-of-contact servers). You can move them between virtual servers as you like, and obviously would typically map DNS to them. Servers will otherwise get anonymous IP addresses, meaning they get something arbitrarily assigned. They're reachable (if you wish) from the net at large, but aren't well-known points for your service.
- AWS also provides a load distribution service. I've never used this actually; it never seemed to fit right.
- S3 is a cloud service, meaning it has no deterministic ingress and egress. It's used for content distribution: writing is expensive, reading is dirt cheap. Content stored is automatically replicated and de-replicated as needed. You have no idea where it lives, in how many copies, and how it's backed up. SLAs make promises about availability.
- Content distribution is a poster child cloud service example. Not all services will easily fit a cloud model. Many other services that have fit the model (mainly using mapreduce or like) are batch processing based and more about massaging massive amounts of data than interactive end-user services.
- Somewhat simplified, if your service can fit around a key-value store (even a sophisticated one like MongoDB), then it's a candidate for a cloud architecture.
- There are plenty of providers of bits and pieces to do things like server monitoring, cost analysis, and automated/manual server provisioning. In fact, I'm getting into this business myself...
A 'cloud' service is not a hosting service - it's a way to build things, a black-box mindset. There may be a well-defined point of contact (perhaps found via DNS), but beyond that everything is dynamic. The initial contact can redirect, either explicitly or implicitly. It's not like a 'hosting' service where you click a button and get a Joomla host. But it might be a viable way to implement such a hosting service.
Re:It also shows... by nothings · 2011-04-23 09:49 · Score: 5, Insightful

Don't forget the one-click patent. True democracy/spirit of the Internet my ass.