Slashdot Mirror


EC2 Outage Shows How Much the Net Relies On Amazon

An anonymous reader writes "Much has been written about the recent EC2/EBS outage, but Keir Thomas at PC World has a different take: it's shown how much cutting-edge Internet infrastructure relies on Amazon, and we should be grateful. Quoting: 'Amazon is a personification of the spirit of the Internet, which is one of true democracy, access to the means of distribution, and rapid evolution.'" An article at O'Reilly comes to a similarly positive conclusion from a different angle.

16 of 147 comments (clear)

  1. Clouds: Up in the air and foggy: by Hartree · · Score: 5, Insightful

    This article seems to be an apology for Amazon.

    Basicly it says "We went down, and took down lots of important stuff. That shows just how important we are and that lots of people use us. Thus, our cloud is a good thing."

    The logic of that doesn't quite work.

    I agree that it's a useful tool, but there are a lot of things that don't make sense to put in the cloud.

    1. Re:Clouds: Up in the air and foggy: by WrongSizeGlass · · Score: 3, Informative

      I agree that it's a useful tool, but there are a lot of things that don't make sense to put in the cloud.

      I always feel better when anything that is mission critical is in-house. Cloud based (and regular internet based) services can become inaccessible for your business if you simply lose your internet connection - it doesn't require all of Amazon to bite the dust.

  2. Except they didn't work. by pavon · · Score: 4, Informative

    A large number of people that are experiencing this outage, did pay for multiple availability zones, and it didn't help them.

    1. Re:Except they didn't work. by el_tedward · · Score: 5, Informative

      I guess what we should learn from this is to put your failover in separate regions, not separate availability zones?

    2. Re:Except they didn't work. by WrongSizeGlass · · Score: 5, Informative
      From the NYT article:

      Big companies, that have decided to put crucial operations on Amazon computers are apt to pay up for the equivalent of computing insurance, analysts say. Netflix, the movie rental site, has become a large customer of the Amazon cloud. Most of its Web technology — customer movie queues, search tools and the like — runs in Amazon data centers.

      Netflix said it had sailed through the last couple of days unscathed. “That’s because Netflix has taken full advantage of Amazon Web Services’ redundant cloud architecture,” which insures against technical malfunctions in any one location, said Steve Swasey, a Netflix spokesman.

      Sounds like it worked for some.

    3. Re:Except they didn't work. by Guspaz · · Score: 4, Insightful

      Paying for multiple availability zones is not the same as paying for multiple locations. There are multiple availability zones in a single datacenter. Netflix got it right, they spread their infrastructure over multiple physical locations, and didn't suffer any downtime despite losing a significant chunk of their infrastructure; it was business as usual.

      Like anything else, cloud computing still requires you to decide how much redundancy you're willing to pay for. If uptime is that important to you, spreading your infrastructure out over multiple datacenters is a no-brainer.

  3. Re:Why The Cloud? by mini+me · · Score: 4, Insightful

    The cloud represents a black box that abstracts the underlying network topology.

    You might send your data to a server in Germany and retrieve it from a server in the USA. When you put something in the cloud you do not have to worry about problems like this because the cloud provider already has a hot backup ready to take the slack in another part of the world. You don't need to know or care how it happens, it just works. S3 is an Amazon example of a cloud service. You send your file to S3 and Amazon takes the responsibility of ensuring that it is available even if a datacenter is blown to smithereens.

    EC2 and EBS are not the cloud. There is no abstraction of the datacenter. Amazon leaves it up to you to choose which datacenter you wish to work in. This can allow you to easily build a cloud application on top of their physical infrastructure, but it is up to you to make it "the cloud". We witnessed so many failures because the applications were not cloud applications, just standard hosted services.

  4. Where have I heard this before... by girlintraining · · Score: 4, Insightful

    Microsoft: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: BURN THE HERETIC! Amazon: We're sorry our product broke and a lot of people weren't able to get online. Slashdot: It's okay. Here, have a cookie.

    --
    #fuckbeta #iamslashdot #dicemustdie
  5. Re:Why The Cloud? by tragedy · · Score: 4, Informative

    Hmm, considering how long "the cloud" has been a buzzword, doesn't it seem like an awful lot of unscheduled downtime if there have been enough events already for people to be claiming that they aren't given a fair shake by the media when they go down. After all, if the media have reported on it several times, it's happened several times. That's more unscheduled downtime than your typical web server gets in a few years.

    Perhaps if they hadn't gone with a word that means fuzzy, insubstantial and ephemeral to describe their services people wouldn't have the same reservations about it. Maybe it's also because IT people don't like their managers to say "I just heard about this neat new thing, let's abandon the system we have now to pursue this" against their advice, then have to deal with being screamed at by their managers later when everything is down and there's absolutely nothing they can do about it because they've effectively ceded all control to a third party service provider who has not managed, thus far, to establish themselves as particularly safe or reliable.

    The apologists whose articles are linked in this Slashdot story seem to think it's great that we're putting all of our eggs into the baskets of known basket droppers. Thus far I'm not impressed enough by these providers. Obviously, in order to do anything on the Internet, you have to rely on some sort of service provider, and even they have to rely on their peers. So obviously there's no way you can have total control. Nevertheless, you should still try to retain all the control you can over your own stuff.

  6. Made it Through Pretty Much Unscathed by ShipIt · · Score: 5, Informative

    Totally concur with others pointing out Amazon offers redundancy if you choose to use it.

    We had webservers, database (master/slave,) and other services split across usa-east and usa-west.

    When usa-east started showing problems, we:
    *) Took the usa-east webservers out of round robin DNS (ttl 1hr)
    *) Verified the slave (in usa-west) was up to date, shut down the master (usa-east,) and converted the slave to master.
    *) Updated all webservers to point to the new master.
    *) Cranked up new usa-west webservers / updated round robin DNS

    I believe Amazon offers mechanisms to do this automatically or we could just always write our own failover scripts, but this is the tradeoff me made. We were willing to trade some service degradation by switching over manually in exchange for avoiding the pitfalls of false-positive detection. Very much an application specific tradeoff, not for everyone, but it worked for what we are doing.

    The key was to avoid putting all eggs in the usa-east basket and splitting up across usa-west, even though we incur additional bandwidth fees, ie master/slave replication transfer is full fee between regions.

    We were never concerned about cascading failures effecting multiple availability zones in a give region nor did it matter for us - our redundancy requirement was geographical diversity, not partitions within a datacenter. We were thinking natural disaster, but the architecture covered us in this case as well.

    The coolest thing to me is just how quickly we were able to shuffle around these resources to avoid a problem area - a couple of hours. There's no way we could have done it so quickly with what we had before - a combination of our own colocated servers and VPS.

  7. Re:It also shows... by camperslo · · Score: 3, Insightful

    Amazon is a personification of the spirit of the Internet, which is one of true democracy, access to the means of distribution, and rapid evolution

    Spirit of the internet? Some on seeing Amazons' passing judgement on Wikileaks might think it more aligned with a certain corporate spirit than a spirit of the internet. If they're really support democracy, which can't function properly with a poorly informed public, maybe they shouldn't be the ones to decide whether or not someone is a journalist.

    Hardware doesn't make spirit. What people are doing, and the thoughts that drive the choices made probably do.

    They are still contented to profit from the sale of books about WikiLeaks.

    http://www.amazon.com/Inside-WikiLeaks-Assange-Dangerous-Website/dp/030795191X

    http://www.guardian.co.uk/technology/2010/dec/11/wikileaks-amazon-denial-democracy-lieberman

  8. Re:Outages by pla · · Score: 3, Insightful

    Many .com websites were unnecessarily down for hours since nobody had thought to plan for a outage. I am sure quite a few architecture meetings where held the following day addressing disaster recovery.

    Y'know, call me crazy, but I didn't even notice the outage.

    I mean, yeah, I read about it on a number of sites (all still up and runing just fine), but honestly can't say I tried to visit even a single site actually unavailable because of the downtime.

    I dunno, perhaps this mostly affected ad hosts and I didn't notice because I already block them?

  9. Re:My cloud is fine by thetoadwarrior · · Score: 3, Funny

    Of course. The botnet authors have a vested interest in keeping your system up.

  10. Re:Why The Cloud? by emt377 · · Score: 4, Informative

    Why is so much in the cloud? I've heard it touted in lots of marketing speak, but I've never worked with it.

    As someone who has never worked with the cloud (shocking, I know), what are the advantages and disadvantages?

    Is it basically just distributed scalable redundant web hosting run by a big company? So you're basically renting to avoid the start-up capital costs of those services and to put them in the hands of specialists, while you focus on your web apps?

    Or is it more?

    There's a big mix-up of lots of different concepts and ideas here, to the point that the questions you ask are impossible to answer.

    - EC2 is a vps-like virtual server provisioning service. You rent a virtual server instance by the hour. APIs exist for you to dynamically add and remove instances as needed. You create an image, then can fire up additional instances as you see fit. Someone like Netflix for instance, can fire up streaming servers during peak hours then shut them down at off hours.
    - You can of course set up your own co-lo systems, but it will be provisioned 24/7 and will cost you more since it will be sized for peak capacity, and even during peak most of the servers will be idle much of the time due to random load variance. You can improve peak utilization by setting up your own virtual provisioning. But then you have ops costs, so unless you have a massive operational scale you'll find it cheaper to buy from AWS (or linode, rackspace, etc).
    - EBS is a logical volume service. You create a volume and mount it on an EC2 instance. Like with server instances, there are API calls to dynamically create EBS volumes. You can unmount it and move it to a different server in the same datacenter, so you could use them for instance to take backup snapshots or log analysis, or similar, in addition to simply being server storage. Of course you get to build or buy the software to do all these things yourself.
    - Server instances belong to groups, and have access controls set up among them. This allows you to create private 'backplane' interconnects, where some things like sql servers are only accessible to instances part of a group.
    - EIPs are elastic IPs, which are IPs you lease and can then assign to any of your server instances (usually ingress and point-of-contact servers). You can move them between virtual servers as you like, and obviously would typically map DNS to them. Servers will otherwise get anonymous IP addresses, meaning they get something arbitrarily assigned. They're reachable (if you wish) from the net at large, but aren't well-known points for your service.
    - AWS also provides a load distribution service. I've never used this actually; it never seemed to fit right.
    - S3 is a cloud service, meaning it has no deterministic ingress and egress. It's used for content distribution: writing is expensive, reading is dirt cheap. Content stored is automatically replicated and de-replicated as needed. You have no idea where it lives, in how many copies, and how it's backed up. SLAs make promises about availability.
    - Content distribution is a poster child cloud service example. Not all services will easily fit a cloud model. Many other services that have fit the model (mainly using mapreduce or like) are batch processing based and more about massaging massive amounts of data than interactive end-user services.
    - Somewhat simplified, if your service can fit around a key-value store (even a sophisticated one like MongoDB), then it's a candidate for a cloud architecture.
    - There are plenty of providers of bits and pieces to do things like server monitoring, cost analysis, and automated/manual server provisioning. In fact, I'm getting into this business myself...

    A 'cloud' service is not a hosting service - it's a way to build things, a black-box mindset. There may be a well-defined point of contact (perhaps found via DNS), but beyond that everything is dynamic. The initial contact can redirect, either explicitly or implicitly. It's not like a 'hosting' service where you click a button and get a Joomla host. But it might be a viable way to implement such a hosting service.

  11. Re:My cloud is fine by RobertM1968 · · Score: 3, Insightful

    All my websites are fine, which is what my high profile clients expect.

    That's because we use Microsoft Windows Servers and Sql Databases.

    Really? I've found both such products to be unsuitable for the demand we put on such infrastructures - unless I throw a lot more hardware at them. With 1/20th the traffic, and 6% the userbase, our forums crawled on Windows Server and MSSQL Server. We switched to Apache and MySQL, and even running the greatly more database intensive (than the Windows solution we were provided) Simple Machines Forum, we need a lot less hardware than we previously did when we had so much less traffic.

  12. Re:It also shows... by nothings · · Score: 5, Insightful

    Don't forget the one-click patent. True democracy/spirit of the Internet my ass.