Slashdot Mirror


Dark Day In the AWS Cloud: Big Name Sites Go Down

An outage of one company's servers might only affect that company's customers — but when a major data center for Amazon hits kinks, sites that rely on the AWS cloud services all suffer from the downtime. That's what happened today, when several major sites or online services (like Instagram and AirBnB) were knocked temporarily offline, evidently because of problems at an Amazon data center in Northern Virginia. From TechCrunch's coverage of the outage: "The deluge of tweets that accompanied the services’ initial hiccups first started at around 4 p.m. Eastern time, and only increased in intensity as users found they couldn’t share pictures of their food or their meticulously crafted video snippets. Some further poking around on Twitter and beyond revealed that some other services known to rely on AWS — Netflix, IFTTT, Heroku and Airbnb to name a few — have been experiencing similar issues today."

182 comments

  1. Say what you will by Anonymous Coward · · Score: 0, Troll

    but I'd rather have a few strategically placed servers in datacenters spread around the country (world?) than something hosted on AWS.

    1. Re:Say what you will by Anonymous Coward · · Score: 5, Funny

      In Soviet Russia, company's customers go down on YOU!

    2. Re:Say what you will by rudy_wayne · · Score: 4, Interesting

      One of the features of AWS was supposed to be the ability to reroute everything to a different datacenter if one goes down. I know I read that somewhere back when AWS was first starting up. You don't think they lied, do you?

    3. Re:Say what you will by teknopurge · · Score: 3, Funny

      In Soviet Russia, company's customers go down on YOU!

      so dirty...

    4. Re:Say what you will by teknopurge · · Score: 5, Insightful

      That's expensive. "Cloud" hosting services cost about 1.5x traditional hosting. When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover. Cloud hosting is great, but it's nothing it does is new or cheaper than hosting 10 years ago.

    5. Re:Say what you will by Anonymous Coward · · Score: 1

      That kind of redundancy is great, but not if you have a connectivity issue and your load balancers are impacted which is what happened here. Also, moving all traffic from one DC to another is a major shift; so depending on the problem and how long it may take to fix, it might not be worth it. Shifting everything over and back is a great feature to have, but it does come at a cost.

    6. Re:Say what you will by ModernGeek · · Score: 1

      You just have multiple DNS records for each service, and the client should move on to the next if one is down.

      --
      Sig: I stole this sig.
    7. Re:Say what you will by whoever57 · · Score: 2

      You just have multiple DNS records for each service, and the client should move on to the next if one is down.

      Unfortunately, "should" is rarely "does". If a brower receives multiple IP addresses for a name, it doesn't try them in turn, it just tries one.

      --
      The real "Libtards" are the Libertarians!
    8. Re:Say what you will by Anonymous Coward · · Score: 5, Insightful

      No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.

      Somehow cloud hosting is taken as the silver bullet to prevent outages-it isn't. You still have to architect things the way you would normally if you're looking for things like disaster recovery, high availability, etc...etc..

    9. Re:Say what you will by chrisgeleven · · Score: 5, Informative

      Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.

      If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.

    10. Re:Say what you will by rudy_wayne · · Score: 3, Insightful

      No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.

      But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff. They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service. That's the whole point of putting stuff in the "cloud".

      If * I * have to worry about that stuff then I might as well just do it myself and not give my money to Amazon.

    11. Re:Say what you will by Cyberax · · Score: 1

      How do you do it automatically? It's simply not possible to transparently replicate arbitrary VMs across geographically distant datacenters (lightspeed and all that...).

      However, AWS provides tools for developers to do it.

    12. Re:Say what you will by AHuxley · · Score: 1, Interesting

      Re you need to pay for resources in each additional region.
      Why the lack of power and real optical links that where regional, power distinct.
      Is this like the idea of linking to a site/city/state/regional 'ring' many times? Very safe from any local cut/drop, cheap, but still very dependant on one geographic provider?
      You also have a submarine communications cable (France to the USA) on the way for that State?? ...the regional services should be good?

      --
      Domestic spying is now "Benign Information Gathering"
    13. Re:Say what you will by Anonymous Coward · · Score: 1

      That functionality is there -- I use it in my own deployments. The thing is, it's not automagic. You have to actually architect your application to take advantage of AWS features.

      TANSTAAFL

    14. Re:Say what you will by Anonymous Coward · · Score: 1

      That has never been the "contract" with cloud. Amazon does not, and cannot, understand the architecture of every one of the multitude of applications running in their cloud. You can pay to have that kind of support from various companies (maybe even including Amazon) but it's not what "the cloud" is.

      The functionality is there for you to make an extremely robust application in AWS -- if you actually take advantage of it and if it's necessary for your business needs to do to that much effort/expense.

    15. Re:Say what you will by alen · · Score: 3

      yeah, but cloud is sold as this super cheap way to compute and have five nines reliability

    16. Re:Say what you will by Glendale2x · · Score: 1

      Outages with AWS and cloudy friends are becoming so common it's almost a non-story at this point.

      --
      this is my sig
    17. Re:Say what you will by silas_moeckel · · Score: 2

      Oh it's possible to do just rather expensive to do well. Disk based bits don't work as sync writes past a region take far to long. Higher up the stack you can deal with 35-70ms of network latency. But now it's not mysql with any old crappy php code.

        AWS is PHB buzzword like IBM a decade and a half ago it makes the VC guys happy that you fixed your scaling issue. In reality everybody else's scaling issues now impact you.

      --
      No sir I dont like it.
    18. Re:Say what you will by sribe · · Score: 2

      ...nothing it does is new...

      Ahem, it siphons additional funds from customers ;-)

    19. Re:Say what you will by Cyberax · · Score: 5, Interesting

      Well, right now I have 500 machines running some heavy calculations in multiple AZs. Works perfectly fine, we have noticed the recent problems but simply stopped using the affected region (us-east-1) for the time being, shifting our calculations to other regions.

      AWS is really great at scaling. It's better than anything else on the market, but it does require a lot of work.

    20. Re:Say what you will by petermgreen · · Score: 2

      Most clients either won't move on to a second IP at all or will only move on once the OS times out the first TCP connection. And OS TCP connection timeouts are long enough that most users won't put up with them for interactive services.

      A better strategy is to put a DNS server in each datacenter, make the TTLs short and set things up to automatically remove records if a server goes offline. This works much better because DNS fallback timeouts are much shorter than TCP connection timeouts.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    21. Re:Say what you will by Glendale2x · · Score: 4, Interesting

      No, you have to manage your own redundancy and failover on AWS. Look at all the effort Netflix has put into programming failover and stress testing and yet they still have frequent outages with AWS.

      --
      this is my sig
    22. Re:Say what you will by Anonymous Coward · · Score: 5, Informative

      either you don't speak English or you need to take your meds. no offense. so i'll try muddling a reply together for you.

      There are many ways to setup remote failover systems. Most of them rely on some type of heartbeat system where there's a "heartbeat message" which they all send each other periodically, and if the current Active goes out of response for too long the others choose one to take over. So it doesn't matter if they're all in one room connected with a single switch, or spread all over the planet.

      The real rub for any mechanism is DNS... if the primary server your FQDN points at drops then you might have redundancy but most people won't be able to take advantage of it. With more manual mechanisms (such as telling users "If our primary site goes down, try here instead!") that's not as much of a concern, just a PITA to keep track of.

    23. Re: Say what you will by Anonymous Coward · · Score: 2, Informative

      Most modern browsers do, indeed, try the next address. It' s a browser feature, though, not an official standard.

    24. Re:Say what you will by Anonymous Coward · · Score: 0

      Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.

      If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.

      Well in terms of a massive failure, most people would be a lot happier with a forced re-connect after a few minutes than simply being down.
      As for DNS, there are all kinds of things you can do, but your solution won't help if the provider's server goes out. You can implement a similar solution if you run your own nameservers, one at each datacenter. Then if your Active node goes dark, and assuming the TTL is low enough on your DNS records, since it's hosting the primary nameserver, other DNS servers will go to the secondary to get a record update. And it'll start serving out the datacenter it's located in since the primary is out of comms. You still run into slow 'failover' problems of course, especially with 3rd party DNS which enforces its own TTL, clients which cache the lookup, etc.

    25. Re:Say what you will by Anonymous Coward · · Score: 4, Informative

      AWS Status Dashboard?

      I know this is /., and people here don't like to read, but did anyone actually read the status dashboard posts?

      This issue was limited to a single AZ, effected only a small number of machines, and was specifically an issue with added latency in EBS volumes. And Amazon completely resolved the issue in 4 hours.

      So, call me crazy, but didn't they do exactly what they are supposed to do? Also, AWS quite clearly states that any given AZ *might* fail. Hence, if you want any sort of high-availability, you replicate across different AZs.

      Plus, I have 10+ EC2 instances, and a number of other resources with AWS, and none of them were effected by this outage.

    26. Re:Say what you will by Anonymous Coward · · Score: 0

      Not possible you say?

      http://technet.microsoft.com/en-us/library/jj134172.aspx

    27. Re:Say what you will by sshir · · Score: 1
      If you care that much about availability, you should do it yourself anyway.

      The first rule of diversification: don't put your eggs into correlating baskets.

      In this context it means that if your primary is on AWS, then your secondary must be on Rackspace or whatever - NOT other AWS.

    28. Re:Say what you will by tnk1 · · Score: 3, Interesting

      Supposedly the load balancer problem did not affect LBs that have backing hosts in two availability zones according to the article. The major question is... who runs everything in one availability zone? You're not supposed to do that for high availability sites.

    29. Re:Say what you will by mysidia · · Score: 2

      When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover.

      Or you can have storage in those regions prepped to failover, with no other resources provisioned. When failure needs to occur, you start spinning up the instances in the other region.

      It does require planning; you can reroute But you don't get that automatically; it requires work and preparation.

    30. Re:Say what you will by JDG1980 · · Score: 1, Insightful

      No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again.

      But that was one of the big promises of "the cloud": that you'd never have to worry about the nitty-gritty of network administration again, your provider would handle all that for you. If that isn't the case, then you gain nothing and might as well host the data yourself.

    31. Re:Say what you will by mysidia · · Score: 3, Insightful

      But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff. They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service. That's the whole point of putting stuff in the "cloud".

      Boy have you been fed a line. Read the SLA. If it's not in there; then you don't get it.

      If you think the cloud provider is clustering your instance and giving you HA; then AWS is not for you.

      Amazon provides availability zones you can provision separate instances storage and networks in. If your application cannot survive the failure of an instance and the failure of an entire availability zone, then you don't have HA, and Amazon won't give it to you -- your app may be inappropriate for AWS, if HA is required.

    32. Re:Say what you will by Zemran · · Score: 5, Funny

      "In Soviet Russia, company's customers go down on YOU!"

      Now we know the truth about why Snowden went there...

      --
      I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.
    33. Re:Say what you will by Zemran · · Score: 4, Insightful

      "nothing it does is new or cheaper than hosting 10 years ago."

      Welcome to the wonderful world of marketing. Sell people what they already have for 50% more.

      --
      I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.
    34. Re:Say what you will by Skapare · · Score: 1

      Configure the warm resources in the other region to constantly monitor the primary. If the primary goes down, they automatically activate the secondary.

      --
      now we need to go OSS in diesel cars
    35. Re:Say what you will by hawguy · · Score: 3, Interesting

      No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again.

      But that was one of the big promises of "the cloud": that you'd never have to worry about the nitty-gritty of network administration again, your provider would handle all that for you.

      There are many different flavors of "cloud" computing - if you throw your app at a cloud provider and blindly expect them to make it highly available, then you'll get what you deserve. There is no end of cloud solution providers that will be happy to help you architect your app for whatever level of redundancy you want. But it's not going to be free.

      Amazon does let you get rid of your network admin and concentrate on managing the servers. No need to worry about BGP, buying bandwidth from multiple redundant providers, buying and administering your own firewalls, network switches, routers, etc.

      But you still have to manage your servers. Amazon will help you with multi-AZ redundancy for things like MySQL.

      If that isn't the case, then you gain nothing and might as well host the data yourself.

      That's depends heavily on your use case. If you have a relatively small number of servers, or have large demand spikes, Amazon can be much more cost effective than hosting your own servers. If you have hundreds of servers and keep them busy all the time, you can probably save money by doing it yourself.

      But if you have dozens of servers, then it's likely that you'll save money with Amazon over buying your own servers, network gear, a SAN, backup solution, hardware service contracts, etc.

      But you have to architect your application properly. We have our core servers split across multiple AZ's with the database replicated across those AZ's. We don't trust our failover/failback scripts enough to make it automatic, so we have a simple web interface to let anyone on the tech team do the failover. The only impact we saw in this outage was higher latency and timeouts to some of our app servers, but our database was not in the affected zone, and Amazon's load balancer correctly routed traffic to the servers in the good AZ.

      Additionally, we have a warm spare running in a different region - the servers are kept up to date with data, but they are running in smaller instance types than we need to run our app, do to a regional failover, we'd have to reboot them into larger instance types (our app startup scripts already tune memory parameters to take advantage of the greater amounts of RAM in the larger instances), then repoint DNS.

    36. Re:Say what you will by Anonymous Coward · · Score: 0

      *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff.

      Did they write your applications?

      If not, how can you expect them to ensure that YOUR application stays up uninterrupted on their services, when you've architected it in a poorly thought out manner?

      You seem to think that "cloud" = "no thought required." That's not the case. The value Amazon gives you is the ability to rapidly expand and contract your capacity as application loads change, the ability to achieve multiple-site redundancy with less hardware investment on your side, and the ability to manage all of your datacenter assets programmatically.

      But YOU have to build that capability into your application, and YOU have to figure out how to take advantage of those services. YOU will not do it as cheaply or effectively as Amazon will, at least not at feature parity. And that's why you give money to Amazon.

    37. Re:Say what you will by Anonymous Coward · · Score: 0

      Probably because properly designed applications that require "high availability" weren't impacted by this outage. Because if they require "high availability," they weren't using resources from a singe availability zone. Which means the resources were available in other AZ's - there was perhaps some temporary loss of capacity due to the outage, but a well-built application (and a properly-planned ops protocol) would detect that and spin up additional capacity in one of the other (functional) AZ's until the problem was resolved.

      If you're not testing this stuff, you deserve the outages you get. If you're just dropping your application on a EC2 instance and assuming "it'll work just fine," then you deserve the outages you get.

    38. Re:Say what you will by Anonymous Coward · · Score: 0

      That's expensive. "Cloud" hosting services cost about 1.5x traditional hosting. When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover. Cloud hosting is great, but it's nothing it does is new or cheaper than hosting 10 years ago.

      Stability is always expensive :)

    39. Re:Say what you will by You're+All+Wrong · · Score: 1

      > Amazon provides availability zones

      Not always. Sometimes they provide *unavailability* zones. That's the problem. And that's what makes people think they're deceptively named.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    40. Re:Say what you will by mysidia · · Score: 1

      Not always. Sometimes they provide *unavailability* zones.

      That's just a play on words: they are availability zones. Just because you don't like how frequently they have a zone-wide outage doesn't change the name.

      Another word for their availability zones is fault domain or failure domain.

      If you don't want two things to fail together, you run them in a different region, using only resources in separate availability zones in separate regions.

    41. Re:Say what you will by TubeSteak · · Score: 1

      So, call me crazy, but didn't they do exactly what they are supposed to do? Also, AWS quite clearly states that any given AZ *might* fail. Hence, if you want any sort of high-availability, you replicate across different AZs.

      For whatever reason, many of AWS's biggest flameouts have happened at the Virginia datacenter.
      Between bad weather, rickety power infrastructure, bad hardware components, poorly configured software/hardware, etc etc etc
      It's like setting up your data center in the Bermuda Triangle.

      --
      [Fuck Beta]
      o0t!
    42. Re:Say what you will by Anonymous Coward · · Score: 1

      Except Amazon is acting as a datacenter.

      It wouldn't surprise me that these outages prompt one of three decisions from their customers: 1. Switch providers, 2. Setup internal servers (internal to the company), or 3. Ghost the entire setup from one region to another.

      That 3rd option is what Amazon hopes their customers do in every single one of these outages. Even though they taut 99.95% uptime (43.8 min according to Wiki). Which they've fail at least once the last 3 years. That is for EC2, I won't assume they use the same SLA for all of their services.

    43. Re:Say what you will by TTL0 · · Score: 1

      I believe that claim only applied within one data center so if zone a went down zone b should pick it up. Not AWS-West. even their cloud load balancers cant balance between 2 DCs.

      --
      Sanity is the trademark of a weak mind. -- Mark Harrold
    44. Re:Say what you will by Anonymous Coward · · Score: 0

      That's called a DR site. Doesn't always work. Even if you are paying for it. I used to work with one of the big hosting providers. It was found of during one of the outages that most of the DR sites weren't actually getting replicated at all. You are safe to assume that it cost our company a few arms and knees but that's how it is.

    45. Re:Say what you will by AK+Marc · · Score: 1

      Please please don't do this. A $50 router/Linux box should be able to serve millions of people, if DNS is reasonably TTL'd. But when everyone sets TTL to 5 min, you have to buy expensive hardware or run dedicated boxes because the number of queries is so high.

    46. Re:Say what you will by AK+Marc · · Score: 1

      Don't set your TTLs short. The recommended values are good enough. Short TTLs are good only for changes (i.e. TTL at 24 hours, then set it to 12 hours 24 hours before an IP migration, then 4 hours 12 hours before a migration, then 1 hour 4 hours before a migration, then 5 minutes 1 hour before a migration, then to 24 hours after the migration is done), obviously set in seconds, 86,400 for a day, 300 for 5 minutes (don't use).

      For what you are stating, DNS servers at both using anycast, so the one that's up says it's the one up is a much better way to do it. The down one won't ever announce itself, and the up one will respond. That's how the Big Boys do it. No 5 minute wait, and no extra load on DNS caching servers.

    47. Re:Say what you will by jimicus · · Score: 3, Informative

      But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff. They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service. That's the whole point of putting stuff in the "cloud".

      Then either you're incredibly naive or you've never looked at what you get with most cloud providers.

      Those £15/month virtual servers? You don't get any redundancy on those. If you're lucky, the provider will move it to a new physical host if the one it's living on breaks down, but they won't make any guarantees regarding how quickly that will happen or how automated and transparent that process is.

      IME, the pile-it-high, sell-it-cheap brigade are punting exactly this. It's a whole bunch of physical boxes running something like Xen with a web-based front end but none of the work necessary to make it truly highly available has been carried out.

      You want true high availability in the cloud - where even an entire datacentre going dark won't affect you? Well, then you have two choices:

        - Architect your own. This means you will need several cheap virtual servers and you'll have to write your own software that accounts for all the various failure modes. Yes, this is difficult. Yes, this means you can't just fire up an Ubuntu image with Apache preinstalled on AWS and forget about it. Yes, this means it's a hell of a lot more expensive because suddenly you need to pay for lots of virtual servers rather than just one or two and you need to put a hell of a lot more work into the development process. But that was a choice you made when you went for the cheap option. Oh, you thought that because they used the word "cloud" in their marketing, that meant they'd already done all that for you? Ah.... no. Sorry.

        - Contract it out to a company that has already built all this at the virtualisation level so you don't need to worry about it at the OS level. They operate a highly-available infrastructure with redundant everything and guarantees that even if something does fail, the redundancy will kick in automatically and you'll see no downtime. There are companies that offer this, but you might want to sit down with a strong drink before you look at their pricing structure. Clue: It's a hell of a lot more than £15/month for a basic virtual server.

    48. Re:Say what you will by petermgreen · · Score: 2

      Short TTLs are good only for changes (i.e. TTL at 24 hours, then set it to 12 hours 24 hours before an IP migration, then 4 hours 12 hours before a migration, then 1 hour 4 hours before a migration, then 5 minutes 1 hour before a migration, then to 24 hours after the migration is done), obviously set in seconds

      Which works fine for planned migrations, useless for unplanned ones.

      The down one won't ever announce itselft

      Sure once a DNS server goes down it will stop sending responses regardless of whether you used anycast or the traditional system of just listing multiple DNS server IPs but responses it has already sent will continue to be used by clients until their TTL expires.

      To make anycast useful without low DNS TTLs you would need to anycast not just the DNS servers but the servers they point to as well. However anycasting TCP based services causes broken connections if routing changes.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    49. Re:Say what you will by Rich0 · · Score: 2

      One of the features of AWS was supposed to be the ability to reroute everything to a different datacenter if one goes down. I know I read that somewhere back when AWS was first starting up. You don't think they lied, do you?

      That all works just fine - if you build your application to use it.

      However, nobody does this, because when your coworker is working on a new feature he can show the boss when it comes time for bonuses, do you want to spend your time working on something that you can only show off when Netflix goes down? Oh, and the way that you know that your work did anything is because your coworker's new feature still works.

      Reliability is a hard personal sell in IT, and that is why there is so little of it. Anybody who set up real AWS redundancy didn't bat an eye at this outage. Their load balancers were not exclusively in one data center, and when they spotted the outage they sent all the work to other data centers where they spun up a ton of capacity.

      If you just deploy all of your stuff and point your DNS at one Amazon datacenter, and that datacenter goes down, well so does everything you could have used to fix the problem. Even so, if you had control over your DNS outside of Amazon you could spin up another instance elsewhere manually if your tools are designed to handle that. If you hard-coded us-east-1 or whatever into all your scripts or didn't even have scripts to completely rebuild your instance from nothing, not so much.

    50. Re:Say what you will by Anonymous Coward · · Score: 0

      DNS is a shitty load balancer, there are a lot of crappy providers out there with busted DNS servers that ignore TTLs etc. Even using a dynamic dns provider or an F5 bigip box or something, you still have to deal with that stupidity of hoping the clients do the right thing.

    51. Re:Say what you will by Anonymous Coward · · Score: 0

      A $50 router/Linux box should be able to serve millions of people

      Until the box gets DoS'd.

    52. Re:Say what you will by module0000 · · Score: 1

      That's not a feature, and I don't recall it ever being advertised as one. You get to decide which zone and which datacenter(datacenters contain multiple zones) your instances/data lives in. If you want to replicate those to other zones and datacenters, it would be a very good idea. Amazon can't automagically do that for you though, and does not advertise that it can.

      --
      Trackball users will be first against the wall.
    53. Re:Say what you will by Anonymous Coward · · Score: 0

      Don Draper has a really low UID, it seems.

    54. Re:Say what you will by Jawnn · · Score: 1

      But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff.

      They are, if you pay for "all that stuff". If you only pay for some of that stuff, you don't get HA.

      They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service.

      Again, AWS does "worry" about such things, and yes, they do have systems in place so that there's no loss of service - for customers who have purchased that level of service.

    55. Re:Say what you will by Jawnn · · Score: 1

      Amazon provides availability zones you can provision separate instances storage and networks in. If your application cannot survive the failure of an instance and the failure of an entire availability zone, then you don't have HA, and Amazon won't give it to you...

      Not for free, they won't, but you most certainly can buy an HA solution from them.

    56. Re:Say what you will by AK+Marc · · Score: 1

      To make anycast useful without low DNS TTLs you would need to anycast not just the DNS servers but the servers they point to as well. However anycasting TCP based services causes broken connections if routing changes.

      Huh? It seems either you don't know how it's used, or you think I don't know how it's used. You over-simplified your answer to the point of technical falseness. Anycast is a routing trick to turn a unicast into a multicast-analogue (that is multi-unicast). As it is tecnically a routing function, yes, routing changes can break it. But pointing that out is like saying "breaking your car can break your car". It's a tautology. Changing routing between the anycast server and the recipient devices doesn't "break" it (though it could break certain sub-functionality, like load balancing, if you are using anycast for it), as it's essentially still a unicast to the destination at that point. It just lets you dynamically select a single "best candidate" for the DNS querry, or send it to multiple for the one that's up to respond.

      You don't unicast the servers they point to, the DNS servers hold different tables. The DNS1 holds entries for WWW1 and MX1, and DNS2 holds entries for WWW2 and MX2. One should assume that WWW1 and WWW2 replicate or otherwise communicate with each other. If DNS1 is down, then DNS2 will respond. All load will go to WWW2. You don't need to anycast the servers they point to. Anycast isn't a DNS "trick". It's a routing trick used primarily for DNS (it can be used for other services, but, in practice, isn't).

    57. Re:Say what you will by petermgreen · · Score: 1

      If DNS1 is down, then DNS2 will respond.

      Sure it will

      All load will go to WWW2.

      Not immediately it won't

      Consider this scenario.

      1: Initially all servers are up
      2: user starts browsing your site. His DNS packets go to DNS1 since that is the closest DNS server to his recursive resolver. DNS1 returns the IP of WWW1 which is cached by the user's recursive resolver and possibly his OS.
      3: site1 (including DNS1 and WWW1) goes down.
      4: The user clicks a link within your site. The browser asks the OS to look up the name, the OS either looks in it's DNS cache and returns the IP of WWW1 or contacts it's local recursive resolver which looks in it's cache and returns the IP of WWW1.
      5: The browser tries and fails to connect to WWW1, the user eventually gets a timeout message if they can be bothered to wait long enough.
      6: The user hits the refresh button and the browser tries again but continues to get connection timeouts until the DNS cache expires the record, contacts DNS2 and gets the IP for WWW2.

      (it can be used for other services, but, in practice, isn't).

      And there is a reason for that. Using anycast for a stateful protocol (e.g. anything TCP based) means that routing changes that would normally be harmless (that is routing changes which mean the packets still get to you but enter your network on a different interface) can break open connections even when all your systems are up. In the worst case if routing is really unstable or if someone has done some dumb load balancing then they may not be able to complete a connection at all.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    58. Re:Say what you will by Glendale2x · · Score: 1

      Netflix routinely touts all the stuff they do to make sure they have high availability - including loss of a whole zone - yet they are still often impacted by AWS choking. Something's not right.

      --
      this is my sig
    59. Re:Say what you will by Gallomimia · · Score: 1

      You have to PAY for that. You're allowed to spawn instances and pay-per-use with righteously low rates on miriads of servers and managed services, and some of the default setups include proxies and load balancers and failovers. So, assuming you've set things up properly, and have failover resources ready to go and programmed to take over when shit hits the fan, then yes you can reroute everything to a different datacenter if one goes down.

      It's not the default tho. AWS is very technical. Not like some service you can pay for and have email accounts and storage space and web space and so on. There's a full API, technical control panels, ready to use scripts, help services, and RSA keypair manager, as well as a price fluctuation setting to nab the cheap prices for your deferrable data processing, but mainly it runs on top of VM and DS instances. Not exactly your turn-key operation, but quite powerful.

      --
      Sadly, a Libertarian cannot force his views on another, and freedom cannot spread as does the cancer known as religion.
    60. Re:Say what you will by Yggdrasil42 · · Score: 1

      The Virginia region simply dwarfs the other AWS regions, likely by an order of magnitude, so there's a correspondingly bigger chance of failures. No magic required.

    61. Re:Say what you will by Cyberax · · Score: 1

      No, not possible. A geographically remote link has latency too big to be useful for automatic replication. And that's even without considering the way to solve the 'split brains' problem.

  2. Lack of reliability by BitZtream · · Score: 1

    How is it that AWS is less reliable than the 4 Windows machines I get stuck managing? One of which has had a failed CPU for a few years now ... yet its still going.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    1. Re:Lack of reliability by Anonymous Coward · · Score: 0

      perhaps because your 4 windows machines aren't getting a billion hits every day?

    2. Re:Lack of reliability by Anonymous Coward · · Score: 0

      Because it's a ton more complex and if a major connectivity issue happens (which seems to be what happened here) then things break.

    3. Re:Lack of reliability by Anonymous Coward · · Score: 0

      how the fuck is it running with a failed cpu

    4. Re:Lack of reliability by cheater512 · · Score: 1

      You look after 4 servers. Amazon looks after 100,000 times that.

      If every server has a 1 in 100 chance of failing each year, you have to wait over 10 years to reach a 50% chance that a server has failed.
      Amazon would have about 11 dying per day. Its amazing that their systems can handle 99% of those failures seamlessly.

      (My math may be way out but you get the point)

    5. Re:Lack of reliability by Mashiki · · Score: 2

      You look after 4 servers. Amazon looks after 100,000 times that.

      I thought there were no servers in the cloud, just people willing to take your money and piss on you.

      --
      Om, nomnomnom...
    6. Re:Lack of reliability by JDG1980 · · Score: 1

      It's a multi-CPU system?

    7. Re:Lack of reliability by Joining+Yet+Again · · Score: 3, Interesting

      But I thought the whole point of the cloud was that everything included redundancy, so a server, or a cable, or a whole datacentre could go down, and because of real time replication, nothing whatever would be missed.

      Or am I just thinking of VAXclusters from, you know, the 1980s.

    8. Re:Lack of reliability by davester666 · · Score: 1

      No, a single CPU. It operated the same when the CPU worked.

      --
      Sleep your way to a whiter smile...date a dentist!
    9. Re:Lack of reliability by ron_ivi · · Score: 1

      How is it that AWS is less reliable than the

      How is it that AWS is less reliable than amazon.com ?

      Seems Amazon occasionally claims to use AWS - yet amazon.com doesn't seem to die as much.

      Are the rest of us just not using it correctly?

    10. Re:Lack of reliability by Cederic · · Score: 1

      Are the rest of us just not using it correctly?

      Oddly, yes.

      Look for your 'single points of failure', including the cloud you're hosting in. Are you certain the cloud service is giving you redundancy on network, power, cooling, compute capacity, storage, physical location? What's their DR approach, and is your use of their service going to benefit from it?

      Why are you using only one cloud hosting service, if uptime's that important to you.

    11. Re:Lack of reliability by petermgreen · · Score: 2

      Are the rest of us just not using it correctly?

      More than likely.

      You need to deal with the possibility of both individual EC2 instances dying and a whole availability zone dying in your application architecture. Amazon provides tools to help with this like load balancing that can operate over multiple availability zones and database service that are replicated across multiple availability zones but you have to actually use those tools if you want to build a reliable application.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    12. Re:Lack of reliability by Anonymous Coward · · Score: 0

      This.

      Use Route 53 to route your DNS. Set up multiple instances running in multiple Availability Zoness, and use ELB to load balance across them. Duplicate this setup across multiple Regions and you've got a very excellent setup able to respond when ever an AZ or two goes down. Also bear in mind that your availability zones are not the same as those someone else may have. This is a security feature that prevents you from knowing exactly where your servers are being provisioned. This is why you use multiple AZs and even go so far as setting up backups in a different Region entirely.

      That being said, I had no problems with Netflix yesterday, because they're quite capable of handling single points of failure such as this. People located in closer proximity may have had some trouble as servers were switched out and rebalanced, but this is to be expected even with a physical server. If someone crashes a car into your data center, you're likely going to end up going down. You have off-site backups and maybe an alternate DC somewhere in case such a thing happens. Why the fuck wouldn't you do the same with AWS?

    13. Re:Lack of reliability by Jawnn · · Score: 1

      But I thought the whole point of the cloud was that everything included redundancy, so a server, or a cable, or a whole datacentre could go down, and because of real time replication, nothing whatever would be missed.

      Or am I just thinking of VAXclusters from, you know, the 1980s.

      Well, that (VAX clusters) or the suite of services that AWS offers that would have prevented such an outage from affecting those customers who chose to utilize them. If you haven't deployed your stuff in multiple availability zones, along with the pieces that are required to tie that together into an HA stack, you don't get "the cloud". Why, oh why, does every article about every outage affecting a single AWS zone fail to mention this?

    14. Re:Lack of reliability by ron_ivi · · Score: 1

      and a whole availability zone dying

      Even doing that, you end up less stable than amazon.com : http://techblog.netflix.com/2011/07/netflix-simian-army.html

      Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.

  3. Running List of Cloud Outages? by bill_mcgonigle · · Score: 3, Insightful

    I thought this might already exist, but I'm not finding it with a quick Google search. Seems like it's a thing that could get ad views from some decent IT audiences.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    1. Re:Running List of Cloud Outages? by sottitron · · Score: 3, Funny

      You should totally create this. I hear AWS is the way to go to get things online quickly and at scale.

    2. Re:Running List of Cloud Outages? by Local+ID10T · · Score: 2
      --
      "You want to know how to help your kids? Leave them the fuck alone." -George Carlin
  4. watch out for birth rates by Anonymous Coward · · Score: 2, Funny

    When morons can't watch TV (or equivalent) they fuck. 9 months later you'll see a birth rate spike.

    1. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      That's no way to talk about your parents.

    2. Re:watch out for birth rates by behrooz0az · · Score: 0

      I wish I could reply after 9 month, but anyways I'm marking it on my calendar.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
    3. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      Mod parent up. A couple years from now preschools all over the world will be inundated with children with a heightened propensity for sniffing glue.

    4. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      They sound pretty clever to me.

    5. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      So I'm assuming your significant other is either bowlegged or pregnant

    6. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      Do you kiss your mother with that mouth? Oh wait, don't answer that. xD

    7. Re:watch out for birth rates by Anonymous Coward · · Score: 0

      I might be weird, but I'd generally take the fucking as a priority over the TV... That can come later, especially with DVRs and BitTorrent.

  5. Why... by Anonymous Coward · · Score: 0

    Why do people even cloud ? Real dedicated overpowered servers with multiple Gbps pipes are available for a few hundred bucks these days...

    1. Re:Why... by AHuxley · · Score: 2

      Middle management in their luxury SUV/sedans sit in daily commutes behind buses with descriptive 'cloud' ads. The upgrade message filters back to their bosses over time?

      --
      Domestic spying is now "Benign Information Gathering"
    2. Re:Why... by LordLimecat · · Score: 1

      Now add cooling, power, generators, physical security, a SAN, a virtualization platform, and multiple failover sites.

    3. Re:Why... by Anonymous Coward · · Score: 0

      Great, you gonna plunk them down in the middle of a Starbucks? Or you gonna build a massive datacenter around them, with heating, cooling, power, support staff, security, and some level of on-site power generation?

      And then if you want redundancy, doing it in at least one other place far enough away from the original that the same snowstorm or earthquake won't take them both out?

      That's why people cloud. The people who say "it's as simple as buying a server for a couple hundred bucks" are the ridiculous shills in this case.

    4. Re:Why... by You're+All+Wrong · · Score: 1

      > Now add cooling, power, generators, physical security, a SAN, a virtualization platform, and multiple failover sites.

      6 of those things we've been doing all along anyway (OK, we non zed-heads have only been virtualising for half a decade). The remaining one AWS doesn't do for you unless you pay significantly more than the simple hosting service.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    5. Re:Why... by Anonymous Coward · · Score: 0

      No i mean renting the servers, at a proper ISP, like OVH.

      The BW/Cooling/Power/Safety/Supervision are all included in the price. SLA support too.

      While less "elastic scalable" this would be cheaper in any case than AWS instances for very high traffic websites.

      Also it's more secure. You won't find people guessing your private keys from the CPU cache.

    6. Re:Why... by Anonymous Coward · · Score: 0

      Such server:

      Intel Bi Xeon E5-2687W (16coresx2threads)
      256GB RAM
      2x3To SAS + 80GB SSD + HW raid

      10Gbps Input BW
      1,5Gbps Output BW
      0% Packet Loss / 2h Hardware incident recovery SLA.

      KVM/IPMI (no need to have your own damn datacenter !)

      For 500bucks/month.

      Host Reddit... twice...

      Don't even rely on those evil CDN.

      ??? profit.
      I can't even fathom who in their right mind would use a cloud, especially Amazon.

      (disclamer: i don't work for said ISP nor have any particular interest in promoting them. They do the job, they do it good.)

    7. Re:Why... by Joining+Yet+Again · · Score: 1

      Cooling? Everyone has that.

      Power? Yeah, you pay for that anyway.

      Generators? Not as expensive as you think. I lived on a farm in the middle of nowhere once, and we had our own water supply and back-up generators.

      Physical security? If you think that you're less likely to enjoy security with servers on your own site than controlled by some random third party who won't let you onto their site, who won't let you audit any of their processes, and who is almost certainly happily giving over information on request to the authorities, you're insane.

      SAN/virtualisation? So, what every decent IT person has been handling since the '60s.

      Multiple failover sites? Well, that would make us better than Amazon already...

  6. Add Adobe Creative Cloud to the List too by JenovaSynthesis · · Score: 3, Interesting

    That went down and I think it ate some files with it. Just before the crash my client reported 103 files being removed. They weren't by me.

    --
    Anonymous Cowards generally receive no replies because you're a coward and I'm a bitch :)
    1. Re:Add Adobe Creative Cloud to the List too by Anonymous Coward · · Score: 0

      I would expect them to come back up when the adobe creative cloud comes back up.

      "removed" just means "client can't see them" because the datastore is unreliable.

      It doesn't mean deletion... at least it shouldn't.

      Adobe should be providing a higher level of support for creative cloud.

      --Sam

  7. And you can do it with AWS by Cyberax · · Score: 2

    You can do it with AWS, no problem. Only one region was affected this time, other regions are OK.

    1. Re:And you can do it with AWS by sshir · · Score: 1

      Saying that it's not a problem does not make it so. Besides, as soon as people learn to failover gracefully guess what would start to happen: other regions would begin buckling under load.

    2. Re:And you can do it with AWS by Cyberax · · Score: 1

      If you really need your servers to be up, then you should buy enough reserved instances in target regions. They are not oversold and guaranteed to be available.

      So yes, making resilient architecture on top of AWS is possible and is not that hard. You'll definitely have to pay extra money for it, but much less if you tried to build it yourself.

    3. Re:And you can do it with AWS by sshir · · Score: 1
      And my point was that when everybody will try to buy them, they would become either oversold or become really-really expensive.

      Actually, regardless, money becomes an issue really fast anyway - few days ago Wired run a story that for many types of loads AWS does not make much financial sense anymore and people started to add two and two together. In other words - people are prepared to pay only so much (in a pinch a little bit extra) - ask a little bit more and they'll start to roll their own.

    4. Re:And you can do it with AWS by Anonymous Coward · · Score: 0

      This sounds a lot smarter than it actually is because your statement assumes a dramatic increase in traffic causing a data center or larger region to fail. That is not the case. Data centers tend to fail because of internal issues that bubble outward until it blocks all incoming traffic.

      Yes, this causes all of its traffic to redirect, but it wasn't the problem to begin with, nor should it all redirect to the same place anyway. Of course, this does add load to other data centers in other regions, which depends on how well you load balanced to begin with, but it's not going to break unless the other data center(s) were never prepared to handle any failure to begin with, thus making it just a really expensive load balancer.

      People don't want to rev up their AWS backends because it costs money for very little pay off, but it has very little to do with AWS being too expensive. You end up catching the upswing in load too late, and by the time you are able to handle it (however many minutes it takes to spin up new VMs), the shock load is most likely passed. Very few companies have the problems that Netflix, Facebook and Twitter see, where they have massive regular requirements that get spiked even more incredibly. The rest of the world tends to just see comparatively small spikes that are manageable with a tiny number of extra servers.

      When you start talking about rolling your own separate data centers, then you are talking a much different kind of money.

  8. Big names by Anonymous Coward · · Score: 0

    Yeah, sure. Maybe in the Bay Area.

  9. Where are the NSA comments?? by Guru80 · · Score: 2, Funny

    I thought for sure the first comment would be "I'm on to you NSA...down time for service "upgrades" " I'm disappointed in you my tin foil hat wearing brethren.

    1. Re:Where are the NSA comments?? by Anonymous Coward · · Score: 0

      So "tin foil hat" now refers to people who don't think the NSA is competent enough to keep their monitoring systems running during an upgrade? Did I miss a memo?

    2. Re:Where are the NSA comments?? by AHuxley · · Score: 1

      An interesting part is why the brands selected to stay in one part of the USA? With all that cheap power, skilled workers and tax breaks offered by other states?
      What keeps big data clinging to the Eastern USA?

      --
      Domestic spying is now "Benign Information Gathering"
    3. Re:Where are the NSA comments?? by Anonymous Coward · · Score: 0

      Yeah it's curious why it's the Norther Virginia site that gets these reported outages... Could it have something to do with proximity to NSA?

    4. Re:Where are the NSA comments?? by Anonymous Coward · · Score: 0

      I think that as we all proceed into the future of technology, we each trust it less and less. As far as we can tell (any of us), technology is as flawed as the mindset of whatever political zeitgeist that runs it. Maybe there's no more tin foil hat wearing brethren on slashdot. I see that the numbers of commenters has dropped here lately.

    5. Re:Where are the NSA comments?? by Anonymous Coward · · Score: 1

      consider the densely populated eastern coast, from miami to boston.. especially from dc to new york.. dc/northern virginia has always been one of the major internet hubs in the u.s. it couldn't have anything to do with proximity to the building with j. edgar's name on it... could it?

    6. Re:Where are the NSA comments?? by i.r.id10t · · Score: 1

      population density, closeness to physical infrastructure, larger pool of qualified workers (maybe), etc.

      --
      Don't blame me, I voted for Kodos
    7. Re:Where are the NSA comments?? by Opportunist · · Score: 1

      According to our copy of your files you got it all right, need us to forward it again?

      Yours,
      NSA

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    8. Re:Where are the NSA comments?? by Anonymous Coward · · Score: 0

      It ceases to be funny after we know it's true.

  10. Has Rackspace had any outages in 10 years or so? by MillerHighLife21 · · Score: 5, Interesting

    I've run servers on both Amazon and Rackspace for several years now and I can't recall a single instance of Rackspace having an outage. On the other hand, Amazon seems to have major issues at least 2 or 3 times a year. Is this stuff tracked anywhere?

    --
    "Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
  11. Maybe wasn't Amazon fault by gmuslera · · Score: 0

    Maybe the NSA screwed things a bit when were installing there their new codenamed program (after Snowden published all the old ones).

  12. Amazon Storefront Problems? by TechHSV · · Score: 2

    Were there any problems with Amazon.com? You'd assume they use their own service.

  13. Oh we're here. by Anonymous Coward · · Score: 0

    Don't lose faith that easily.

  14. Statistically unlikely # of sites going down by Anonymous Coward · · Score: 1

    What is going on? I don't buy it. While I get that you can't tell when the NSA has tapped the line I would imagine that things might go down in such instances. Something has to go down before they cut the line unless there are multiple entry points maybe. However I wonder if this has something to do with the way things are being done. That is there not tapping the line any more. Rather they are force implementing taps that provide access to specific data types. For instance now they can do more than just search for strings in users email. Now they can see a users facebook page as they user sees it for example instead of just a series of texts.

    I gues stuff goes down. But not at the rate in which major sites are going down. If there is a logical explanation of some kind that impacts everybody (sun bursts radiation type thing) then please... provide it. But they all seem to be giving vague answers to the reasons the sites have gone down. Ebay it was 'regular maintenance' gone amok (I do believe that was scheduled, but others haven't been, I don't think).

    1. Re:Statistically unlikely # of sites going down by Anonymous Coward · · Score: 1

      Data centers do crash from time to time, and if all of your "cloud" is in one data center, then it's not really a cloud at all.

      I forget Amazon's exact terminology, but they have region, and within those, they have data centers. When you setup your infrastructure, you pick a data center. You can provide some fault tolerance by failing over within the overall region, or you can choose to synchronize between (and therefore failover to) an entirely different region. That is the cloud: when an data center failure does not cripple your operation.

      From the sounds of it--and it has happened before--Amazon's entire AWS Northern VA region failed. Anyone not able to recover from that was down (which theoretically includes my own site that is hosted in single a Northern VA AWS data center). Anyone that was paying the extra money, and had setup their software to be ready for it, was humming along just fine, albeit likely a little bit slower due to added strain on the other regions as a result of redirected traffic that would traditionally have gone to the failed data center(s).

      Ironically, I did not see any issues whatsoever during the failure today, which just means that it didn't completely fail or I did not spot it (my site is not doing anything fancy to handle hardware failure or notifying me of it). This leads me to believe that there was a problem, but both Instagram and Vine are probably not setup nearly as optimally as they should be given their scale (from my understanding, Instagram is a joke anyway and Facebook got incredibly ripped off; they should have made it themselves and burned Instagram to the ground).

  15. Re:Has Rackspace had any outages in 10 years or so by Anonymous Coward · · Score: 1

    Yes. Rackspace even has an outage on their main website that lasted *days* just few months ago, if you wanted to access it via IPv6. Sadly, there was not easy place to report the outage. The technical contact in whois is something at netnames.com? So I just ignored it.

    Anyway,

        https://status.rackspace.com/

    lots of reports of small issues. You should know this stuff if you are running an instance on their hardware!!

  16. Re:Has Rackspace had any outages in 10 years or so by Anonymous Coward · · Score: 0

    RS has had issues

  17. Re:Has Rackspace had any outages in 10 years or so by AHuxley · · Score: 1

    Would make a good site, a historic long term heat map of server outages. A lot of tech press to search back into, thankfully you can buy into digital press databases :)

    --
    Domestic spying is now "Benign Information Gathering"
  18. Multiple regions, anyone? by kriston · · Score: 1

    Isn't this why AWS offers multiple regions?

    Such large sites should understand that having multiple availability zones means nothing if the zones are all in the same region. Oh, and your application would need to be designed for failover.

    In addition, when looking for high-availability, you don't segregate your audience to individual regions. You let the working regions take over for you.

    Or spend the extra money and set up your own co-lo arrangement.

    --

    Kriston

  19. internet forecast by Anonymous Coward · · Score: 0

    partly cloudy with a chance for server outages.

  20. Nothing wrong here by Anonymous Coward · · Score: 0

    Just had to power down while the NSA live feed was plugged in.

  21. Re:NSA Flood Spoils The Show by Anonymous Coward · · Score: 0

    "I hear-by renounce my allegiance to the United States Of America and for which it now stands."

    You are the property of the Corporation called the UNITED STATES OF AMERICA and your allegiance is not yours to renounce. You are mortgaged property belonging to the Federal Government of the Corporation called the United States of America. You must do as you are told.

  22. actually, no by Chirs · · Score: 5, Informative

    "cloud" is sold as a *convenient* way to compute, where it's quick to add resources when needed so you can start small and scale up (and down) with demand.

    It is *not* generally considered a cheap or particularly reliable solution. So far at least none of the cloud providers are offering five nines--if you want that, you should (for now at least) jbe looking at enterprise/telecom gear.

    1. Re:actually, no by You're+All+Wrong · · Score: 4, Informative

      > It is *not* generally considered a cheap

      Quoth Forbes:
      Cost savings… [...] These are the advertised benefits of cloud computing

      Quoth Salesforce:
      4. Cap-Ex Free [...] no need for capital expenditure [...] minimal project start-up costs

      Quoth Verio:
      Achieve economies of scale [...] Reduce spending on technology infrastructure. [...] Globalize your workforce on the cheap [...] Reduce capital costs.

      And those were the first 3 hits for ``benefits of cloud computing'' (although the first one is meta, it refers to others refering to cost savings).

      I hate to shake you from your firmly entrenched world-view, but you have to know that people are touting cloud solutions as ones which have cost benefits. Whether they're valid claims or not is irrelevant, they are undeniably being made.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    2. Re:actually, no by Narcocide · · Score: 3, Funny

      YOU sir look like a shrewd and discerning businessman. How would you like to buy a bridge?

    3. Re:actually, no by AK+Marc · · Score: 3, Insightful

      Whether they're valid claims or not is irrelevant, they are undeniably being made.

      Like "core business" Every time I hear that, it's from a contractor or someone who just spoke to a contractor, and it's always about why it's good to outsource everything to contractors. It doesn't take long for that to be a pattern.

      Now, ask cloud computing companies how much they charge, compared to renting tin. It's always cheaper, except when it's not, and even then, it's cheaper to use the more expensive cloud because tin can go down, the cloud can't, or something like that.

  23. cloud is convenient, not reliable by Chirs · · Score: 2

    As a cloud customer, reliability (currently at least) is up to you. If you want the extra reliability of running instances in multiple availability zones then it's up to you to pay for it.

    The point of the cloud as it stands currently is not that it's cheap or reliable, but that it's easy to scale up/down with demand.

  24. Everybody that is surprised is stupid... by gweihir · · Score: 4, Insightful

    That things like this will happen with a cloud infrastructure are obvious. That the reliability claims made by the cloud providers are fantasy is also obvious. As soon as they start to do "uptime or else" (meaning you get tons of money as downtime compensation), things may be different. but they will not do that. At this time, the only thing you can do is change to a different cloud provider, which will have the same issues. Uptime guarantees without penalties when failed to meet them are worthless.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Everybody that is surprised is stupid... by VortexCortex · · Score: 4, Insightful

      We built a decentralized network called The Internet, even capable of withstanding global thermonuclear war -- packets rerouted moments after a city disappears from the mesh... And folks use data silos? Protip: Don't centralize services, that's daft in terms of both uptime and congestion.

    2. Re:Everybody that is surprised is stupid... by You're+All+Wrong · · Score: 1

      > Uptime guarantees without penalties when failed to meet them are worthless.

      I thought AWS did have penalties, of the partial-refund variety. Which ain't exactly great. However, I think the uptime they guarantee is barely 2-nines. They can be down a whole working day per month, IIRC. Which is pathetic, and certainly not worth paying for.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    3. Re:Everybody that is surprised is stupid... by gweihir · · Score: 1

      They say "99.95%" in their SLA (http://aws.amazon.com/ec2-sla/), but only as target and only when "commercially reasonably". That is commercially reasonable for them, not the customer, and that is the real problem. If you run on your own infrastructure, commercially reasonable refers to what it costs _you_ when _your_ site is unavailable. That is a whole different thing. You are right, that there is some compensation, but it is topped at 30% cost reduction and hence completely laughable.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:Everybody that is surprised is stupid... by Anonymous Coward · · Score: 0

      Maybe at one time. It's not that way now Decentralization and redundancy are no longer the name of the game. Star networks and data centers are the way we do it now, because it's marginally cheaper that way (and cheap is all that matters to us).

    5. Re:Everybody that is surprised is stupid... by Opportunist · · Score: 1

      The internet ain't what it used to be. The internet of today couldn't withstand a punk with a BB-gun, let alone a tacnuke.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    6. Re:Everybody that is surprised is stupid... by mcrbids · · Score: 1

      Our contract at data center that we host at has significant penalties for downtime. In about 6 years of hosting there, we've had exactly 2 incidents of less than 1 hour each.

      Of course, the deluge of notifications we get every time a fly causes a ballast to fail in the 3rd light down the main hallway, or when our network usage at 95% exceeds the monthly average by 0.05% get a bit annoying, but I have no complaints of the quality of service.

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    7. Re:Everybody that is surprised is stupid... by gweihir · · Score: 1

      Seems some people are getting it right. No surprise. One reason the "cloud" is cheap (well, it is not really if you look closely enough), is that it cuts corners that cannot be cut when reliable operation is needed.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  25. Re:Has Rackspace had any outages in 10 years or so by CritterNYC · · Score: 5, Informative

    It depends which data center you're in. PortableApps.com has been hosted at Rackspace for years and we had multiple major outtages due to ongoing power issues in the Dallas data center in 2009. The switch from grid to ups was failing and would take the whole wing of the data center out with every server crashing hard. It would take quite a while to come back up. Then we'd have to wait hours for the Rackspace folks to rebuild our corrupted database (fully managed account on a dedicated server). It happened two weekends in a row in June and one other time if I recall correctly, basically costing us a full day of downtime each time.

  26. This is why I laugh at tech pundits who preach... by bagboy · · Score: 3, Interesting

    public cloud services as "the future". I will never risk my corporate data uptime and reliability to some "location in the cloud". I'll stick to private clouds (VMWare/VCenter) where I have control of both hardware and software and reliable failsafe systems. At least then if I have downtime I also have accountability and predictability. They same cannot be said for cloud providers and no matter what anyone says once the data leaves your hardware, you have lost that control.

  27. Re:Has Rackspace had any outages in 10 years or so by Frosty+Piss · · Score: 1

    Netcraft?

    --
    If you want news from today, you have to come back tomorrow.
  28. Re:Has Rackspace had any outages in 10 years or so by Anonymous Coward · · Score: 0

    Amazon also offers some of the cheapest prices, so you pay for what you get.

  29. Instagram? Is that some kind of website? by PopeRatzo · · Score: 0

    The only web site that I've noticed being down in the past few weeks has been Wikifonia, the wonderful place where crowd-sourced MusicXML lead sheets for all sorts of music are available.

    They're back online now, and at least from what I can see, there is great jubilation among musicians worldwide. Where else can you go and search for some old jazz standard and get an immaculate lead sheet, instantly transposable into any key, downloadable as a PDF?

    I think Wikifonia has been single-handedly keeping the vast Great American Songbook alive, for which they deserve great thanks.

    I thought it was just an issue where some big music publishing group that represents outfits that charge $5 for a lead sheet to a song whose composer has been dead for half a century has been hassling them, but since it's back online and faster than ever, I think it might just have been a technical glitch.

    Wikifonia, salute!~

    --
    You are welcome on my lawn.
  30. Wrong terminology? by elfprince13 · · Score: 5, Funny

    Shouldn't this, technically speaking, be a "bright day" or a "sunny day"? After all, that's what I call it when the cloud-coverage breaks around here.

  31. Re:Has Rackspace had any outages in 10 years or so by Anonymous Coward · · Score: 1

    I had an outage in IAD just two weeks ago. Connectivity failure on several aggregates affecting many customers. Rackspace shill much?

  32. Serves you right by Gothmolly · · Score: 1

    For believing and investing in some handwavy concept called 'cloud' where you abrogate responsibility take the iOS view (it Just Works) of technology.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Serves you right by You're+All+Wrong · · Score: 1

      But that's the problem - it doesn't serve them, right?

      --
      Your head of state is a corrupt weasel, I hope you're happy.
  33. Re:This is why I laugh at tech pundits who preach. by l0ungeb0y · · Score: 3, Interesting

    Depends on which "future" you are talking about. The future where the bulk of personal data is stored on the cloud to be shared across devices and with friends, family and authorized services is one I think is bound to come to fruition.

    The future where Corporations put their core infrastructure into the Cloud is not one I ever recall anyone talking about.

  34. quantum entanglement by Anonymous Coward · · Score: 0

    how i hate it. rebuild my local lan vm server and on the other side of the world aws craps out ... so either quantum entangelment or a tried frame from nsa?

  35. Realistically by corran__horn · · Score: 3, Insightful

    Chances are that there are no providers that offer a true 99.999% uptime. If you demand that, you need to be building your code to run in a HA cluster with nationwide dispersion. (For reference, you get 5.25 minutes of downtime across a whole year).

    99.999% uptime is also completely unnecessary, but sounds really good to management until you talk cost.

    --

    If people can connect to one another even the smallest of voices will grow loud.
    --Serial Experiments Lain
    1. Re:Realistically by myowntrueself · · Score: 2

      Chances are that there are no providers that offer a true 99.999% uptime. If you demand that, you need to be building your code to run in a HA cluster with nationwide dispersion. (For reference, you get 5.25 minutes of downtime across a whole year).

      99.999% uptime is also completely unnecessary, but sounds really good to management until you talk cost.

      Just make sure that scheduled downtime isn't included in the 99.999% and schedule downtime on a regular basis!

      --
      In the free world the media isn't government run; the government is media run.
    2. Re:Realistically by klubar · · Score: 1

      How up time is calculated is one of the really weaselly ways that companies set up SLAs. Some companies don't start counting downtime until it's reported, others require a minimum threshold of downtime before it counts, others define available in somewhat meaningless terms (e.g., server up, but network down doesn't count).

  36. Re:This is why I laugh at tech pundits who preach. by Anonymous Coward · · Score: 0

    I'll stick to private clouds (VMWare/VCenter) where I have control of both hardware and software and reliable failsafe systems. At least then if I have downtime I also have accountability and predictability.

    And unless you're a very large company, this will be wasted money. And it'll be less reliable than properly-designed applications using Amazon's infrastructure for cheaper.

    Now, if you're a bank, and you're putting your critical customer data up on Amazon, that's probably pretty dumb. But there's a lot of data that's not "critically sensitive" like that.

  37. Pretend this was a US government outage by Required+Snark · · Score: 3, Insightful
    It's a thought experiment: pretend it was the FAA having a big chunk of airspace loose all ability to track aircraft, or NOAA loosing data collection so that weather forecasts are disrupted. (This, or something like it happens from time to time.)

    The right wing talking heads on TV would be squealing like stuck pigs. They would be screaming about "gubment" waste and incompetence, and start floating bills to privatize the FAA (or whomever). You'd get the same response on Slashdot as well.

    Meanwhile in real life AWS, Google, and NASDAQ have all had dramatic failures in recent weeks. Although NASDAQ got a fair amount of coverage, and Google got some mention, AWS has been pretty much below the radar for the mainstream media. No one is making dramatic statements on TV about how Google is run by a bunch of idiots, or NASDAQ, a quasi-governmental entity, should be nationalized, because when it fails the entire economy is as risk. As far a critical comments, it's the sound of crickets.

    Clearly, there is a double standard. When there are problems with technology in the public sector, it's all hostility and table thumping. Similar failures in the private sector are treated like natural disasters completely beyond human control. According to common rhetoric, the private sector is always better then the public sector. Yet when the private sector fails, no one ever compares it to the well functioning public sector.

    There is clearly a lot of hypocrisy in bashing the government. A lot of political power is at stake, and along with that goes a lot of money. This situation makes some people very happy, because they are getting what they want, both in public policy and private profit.

    --
    Why is Snark Required?
    1. Re:Pretend this was a US government outage by Anonymous Coward · · Score: 0

      ...or NASDAQ, a quasi-governmental entity, should be nationalized, because when it fails the entire economy is as risk.

      Speak for yourself, my NASDAQ related stocks were sold thanks to that nice little tanking they took back in 2000. It's called diversification.

    2. Re:Pretend this was a US government outage by LordLucless · · Score: 1

      pretend it was the FAA having a big chunk of airspace loose all ability to track aircraft, or NOAA loosing data collection so that weather forecasts are disrupted...The right wing talking heads on TV would be squealing like stuck pigs. They would be screaming about "gubment" waste and incompetence.

      Because its their money being wasted.

      Meanwhile in real life AWS, Google, and NASDAQ have all had dramatic failures in recent weeks. Although NASDAQ got a fair amount of coverage, and Google got some mention, AWS has been pretty much below the radar for the mainstream media. No one is making dramatic statements on TV about how Google is run by a bunch of idiots, or NASDAQ, a quasi-governmental entity, should be nationalized, because when it fails the entire economy is as risk.

      The people who care (i.e. people who were hosting at US-East-1) know, and they have the opportunity to withdraw their custom from AWS. They can employ another provider, or bring it in-house and do it themselves.

      Clearly, there is a double standard.

      No, you are just comparing apples and oranges - people don't bitch about private companies because, worst comes to worst, they can take their custom elsewhere. Government needs to be held to a higher standard precisely because that freedom is lacking.

      --
      Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
    3. Re:Pretend this was a US government outage by Anonymous Coward · · Score: 0

      Because its their money being wasted.

      As opposed to their money being handed out to private contractors to be wasted? That seems to be the driving force behind Republican privatization, get as many Custers Battles involved in the process as possible and when billions just disappear, no problem.

    4. Re:Pretend this was a US government outage by NeutronCowboy · · Score: 1

      Because its their money being wasted.

      To some extent. But it is frequently pennies of their money that is being wasted, if that much. They make it sound like they're the sole supporters of whatever government agency had an issue.

      No, you are just comparing apples and oranges - people don't bitch about private companies because, worst comes to worst, they can take their custom elsewhere. Government needs to be held to a higher standard precisely because that freedom is lacking.

      In quite a number of instances, you can't - or at least, there is no comparable product to select. Example: Internet access, health insurance, airport, airline, power, etc.

      And government isn't being held to a higher standard, government is assumed to be incompetent by definition. Not worse, not less efficient, but literally incompetent. You're trying to create a narrative that doesn't exist in conservative talk or political circles. The politicians I find doubly ironic, since they are part of government. I wonder how they'd respond to: so are you incompetent too?

      --
      Those who can, do. Those who can't, sue.
    5. Re:Pretend this was a US government outage by seth_hartbecke · · Score: 1

      Because if I choose not to use google, AWS or even NASDAQ to perform the services offered the police don't show up at my house and compel it.

      You may say Google and NASDAQ offer services that are difficult to impossible to find elsewhere. Yet there are alternative search engines, and there are ways to trade stock that do not involve NASDAQ. If these companies continue to mess up, their competitors will get more traction. If my government messes up, they still compel me to use their service. The difference while minor at incident #1 can be quite a difference by incident #n.

      --
      END
    6. Re:Pretend this was a US government outage by DNS-and-BIND · · Score: 1

      You know, I never thought I'd see the day when pro-US government posters tell people they're wrong for holding the US government to a higher standard than private industry. But here it is.

      Pro-US government? Snowden! Manning! Wikileaks! Didn't you get the memo?

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
  38. Maybe AWS became self-aware by bigdarryld · · Score: 1

    and the operators realizing this tried to deactivate it?

  39. Re:Has Rackspace had any outages in 10 years or so by sanitycrumbling · · Score: 1

    I've had tons of outages on rackspace cloud. Including systemwide networking outages.

  40. Raise the Price of US-East already by Anonymous Coward · · Score: 0

    AWS US-East is overloaded. It will continue to be overloaded as long as US-East is the cheapest region, because people are idiots. Here's an idea: RAISE THE PRICE OF US-EAST.

    1. Re:Raise the Price of US-East already by Virtucon · · Score: 1

      Well I've wondered about that myself. Yes, US-East is not only less expensive than the other zones for quite a few services, there's also AWS pricing policies that make it more attractive to use US-East for a lot of things. For example, S3 ingress data transmission charges are waved in US-East but not in the other AZs so I have other AZs replicating to US-East for disaster purposes just because of that. If US-East goes down it doesn't matter because I have apps leveraging multi-AZ availability and when US-East recovered, those services hosted there recovered just fine and replication continued just as if I had built this out in self-provisioned or co-located data centers. Application tools and services for managing this have become better and will keep getting better and better because of the economies of scale.

      Sure, there's a lot of hype around cloud services but there's also a lot of benefits too for businesses looking to become more agile in responding to new opportunities or looking at ways to respond to changes in demand. It doesn't relieve you of doing your homework and making sure that the services you choose can support your requirements including those key non-functional areas of Availability and Reliability.

      --
      Harrison's Postulate - "For every action there is an equal and opposite criticism"
  41. Re:This is why I laugh at tech pundits who preach. by Anonymous Coward · · Score: 0

    Poorly designed apps that depend on instances rather than redundant zones suck regardless of where they are hosted.

    I bitched about this today. There is a right and a wrong way to write a cloud application. Blaming the Cloud for an outage is like blaming the road for a car wreck.

  42. To the trenches! by Anonymous Coward · · Score: 0

    Thank god the cloud-fanboys and cloud-haters have already kicked in to prevent any sane discussion.

  43. In what has to be proof of the existence of hell by gelfling · · Score: 1

    Netflix did NOT go down.

  44. Re:Has Rackspace had any outages in 10 years or so by MillerHighLife21 · · Score: 1

    I was a former Slicehost user in the St. Louis data center and then was moved to Chicago after the Rackspace acquisition. Even so, there's never been so much as a blip from there in the last 5 years. Probably is data center dependent, I just never remember hearing about anything.

    Friend of mine here in town owns a web business using about 9 Rackspace servers to host 700 websites and he said they hadn't had an outage in the last 8 years.

    --
    "Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
  45. Google Groups on AWS? by Anonymous Coward · · Score: 0

    I've seen some weird error messages mentioning Google Groups today when sending emails in gmail to addresses, afaik not even remotely related to such a group...

  46. Didn't want to share a cell... by Dareth · · Score: 0

    No he went to Russia so he wouldn't have to share a cell with Bradley Manning.

    --

    I only look human.
    My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
    1. Re:Didn't want to share a cell... by Anonymous Coward · · Score: 1

      Chelsea Manning

    2. Re:Didn't want to share a cell... by bbsalem · · Score: 1

      UM, do you mean Brandi Womanthing? :-)

  47. nothing of importance affected by iggymanz · · Score: 1

    entertainment and social media down? who gives a shit? grow up.

  48. Best week ever for sys admins by klubar · · Score: 1

    I have to say with all of the big names having problems recently this has been one of the best weeks ever for the lowly corporate sys admin. Now if the company's email, file or web server--or even the coffee machine goes down, they can point to the big names that also have problems. It's great to be able to say that even at companies like Amazon, Google or Microsoft with all of their talents their servers also have problems. It's the greatest excuse ever for tripping over the power cord. And if that doesn't work, you can always blame the NSA for the typo in your email or the late TPS reports.

    Thanks everyone and happy SysAdmin day! (which isn't today, but due to the unexpected outage is running late)

  49. read carefully by Chirs · · Score: 2

    "no need for capital expenditure" and "minimal start-up costs" are not the same as "cheap". All it means is that you don't need to pay up-front.

    It's like renting a car for a day vs buying one. If you only need a car a few times a year, renting is cheap. If you need a car every day for a decade, you should probably buy one.

  50. I'd like my 911 service to be reliable... by Chirs · · Score: 1

    There are some things where five-nines makes sense.

    Disclaimer...I have worked in the telecom industry in the past.

  51. Self Documenting? by Dareth · · Score: 1

    Is that recursion or just self documenting?

    --

    I only look human.
    My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
  52. Here's the rub... by Anonymous Coward · · Score: 0

    The issue is that when people sing the praises of EC2, they always seem to imply people mostly know better and have moved past the need for reliability at the lowel level. However, events like these *repeatedly* show that some of the biggest names flame out multiple times a year. This suggests that while the *theory* may be there, there isn't so much good examples in practice.

    Netflix is the example that frustrates me the most. They brag about how bullet proof their services are because they are so smart, to the point of intentionally killing random instances in production to verify to themselves they are still bullet proof. However, they have significant outages and streams randomly do fail out. The shining example that 99% of people hold up as to why the 'cloud model' of disposable VMs is totally worth it and solid fails far more often than typical schemes. All this while EC2 has been able to take advantage of brand recognition and actually charges *more* for less reliable infrastructure than some other hosting providers.

  53. Wasn't clouds supposed to prevent this? by the_arrow · · Score: 1

    I thought one of the pro's of using "the cloud" was that these events would not happen, as you are no longer relying on a single datacenter?

    If AWS is putting several "clouds" in a single datacenter, what's the use of AWS?

    --
    / The Arrow
    "How lovely you are. So lovely in my straightjacket..." - Nny
  54. Re:This is why I laugh at tech pundits who preach. by Reziac · · Score: 1

    "The future where Corporations put their core infrastructure into the Cloud is not one I ever recall anyone talking about."

    Microsoft tried to push that very concept during the Windows 2000 launch tour, back in 1999. At the presentation I went to in Los Angeles, the audience, ~1000 professional IT types, all developed identical angry scowls.

    --
    ~REZ~ #43301. Who'd fake being me anyway?