Slashdot Mirror


Amazon Forced To Reboot EC2 To Patch Bug In Xen

Bismillah writes AWS is currently emailing EC2 customers that it will need to reboot their instances for maintenance over the next few days. The email doesn't explain why the reboots are being done, but it is most likely to patch for the embargoed XSA-108 bug in Xen. ZDNet takes this as a spur to remind everyone that the cloud is not magical. Also at The Register.

11 of 94 comments (clear)

  1. Compared to Azure by Anonymous Coward · · Score: 3, Informative

    It's funny for me to read that Amazon is notifying its users of an impending reboot.

    I've been suffering with Azure for over a year now, and the only thing that's constant is rebooting....

    My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.

    Let that sink in for a while. You are actually required to program your application to expect failed database calls.

    I've never seen such a horrible platform, or a less reliable database server...

    1. Re:Compared to Azure by CodeReign · · Score: 5, Insightful

      You are actually required to program your application to expect failed database calls.

      Yes, of course you are. Only an idiot would expect 100% of db calls to be successful.

    2. Re:Compared to Azure by bad-badtz-maru · · Score: 2

      You can "handle" a dropped connection, but if you're in a transaction in the middle of updating data, it's probably not going to be transparent to the user.

      -E

    3. Re:Compared to Azure by Aaden42 · · Score: 4, Insightful

      Be sure to thank Microsoft for teaching you the value of robust error checking. Assume any other host you need to talk to was nuked from orbit five seconds ago. Write your code to bounce back from that to the degree possible.

      At the very least, DB *connections* should be assumed to have evaporated since the last time you accessed them. Use some sort of pooling library that can deal with that transparently if you like, or just catch & retry if necessary.

      Seriously though, sounds like the environments you’ve worked in have been simple enough with low enough transaction volume that you got lucky & everything just worked. DB & app server on the same box maybe? Dealing with temporarily unavailable external hosts is just part of writing multi-tier code.

    4. Re:Compared to Azure by Shados · · Score: 4, Insightful

      if you're in an transaction and it fails, you can just redo it. Thats the whole damn point.

    5. Re:Compared to Azure by Shados · · Score: 2

      We're not talking about the thing going down here, just database connection sometimes failing. If you have a 100% failure proof network and you can replicate it, go tell Google, Amazon, etc. They have a job for you.

    6. Re:Compared to Azure by Just+Some+Guy · · Score: 3, Informative

      When hosting your app in the cloud, regardless of provider, it is considered best practice to design for failure.

      Netflix goes so far as to randomly kill services throughout the day. Their idea is that it's better to find systems that aren't auto-healing correctly by testing recovery during routine operations than to be surprised by it at 3AM. It's successful to the point that you generally don't know that the streaming server you were connected to has been killed and a peer took over for it. That is how you make reliable cloud services.

      --
      Dewey, what part of this looks like authorities should be involved?
    7. Re:Compared to Azure by Just+Some+Guy · · Score: 2

      The architecture of Google is utterly useless for many businesses cases.There are many use cases where it'd be perfectly appropriate.

      it does not and can not provide accurate answers to queries.

      In most cases, businesses don't really care about accurate answers to queries; they want quick, more-or-less correct answers. For example, suppose Amazon has a dashboard that shows their book sales on an hourly basis. Timeliness is more important than exactness here, and answers more precise than the pixel resolution of the graph on the big TV are wasted. A "big data" style query that is 99% correct and runs in 5 seconds is much more valuable here than the exact answer that returns in 2 hours.

      For accounting types of reporting, slow, exact architectures are probably more appropriate. For realtime analytics, a best guess that comes back immediately may be the right thing.

      --
      Dewey, what part of this looks like authorities should be involved?
  2. Re:migratable vms? by hawguy · · Score: 2

    How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.

    AWS can't live migrate VM's.

  3. email from ec2... by Connie_Lingus · · Score: 2, Insightful

    "we will be re-booting the cloud today,,,in order to protect your 3,2 petabytes of data, you should download it to local storage in case of a fail event. thanks for using cloud storage on computing. have a great day."

    --
    never bring a twinkie to a food fight.
  4. I don't get it by amon · · Score: 2

    I really don't get it, every virtualization technology has the possibility to live migrate the virtual machine to a different physical host, vmware, kvm, openvz, xen, everyone has it, for at least three of them you don't need to have shared storage. Why don't they use it?

    --
    -- If you can't convince them, confuse them (Truman)