Amazon Forced To Reboot EC2 To Patch Bug In Xen
Bismillah writes AWS is currently emailing EC2 customers that it will need to reboot their instances for maintenance over the next few days. The email doesn't explain why the reboots are being done, but it is most likely to patch for the embargoed XSA-108 bug in Xen. ZDNet takes this as a spur to remind everyone that the cloud is not magical. Also at The Register.
It's funny for me to read that Amazon is notifying its users of an impending reboot.
I've been suffering with Azure for over a year now, and the only thing that's constant is rebooting....
My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.
Let that sink in for a while. You are actually required to program your application to expect failed database calls.
I've never seen such a horrible platform, or a less reliable database server...
If your design has issues with instances going up & down, you're doing it wrong and shouldn't be using cloud services to begin with.
How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
AWS can't live migrate VM's.
I was just wondering why not live migrate the VM to a patched rig and then reboot the unpatched rig. Guess that answers the question.
that's Xen and Xen != VMWare but it works for about 99% of the workloads out there. From what I received it's a cold restart not a simple reboot. Periodically they upgrade hardware/software and once a month we go through a cold restart an all of our AWS instances. It's easy with the right tools.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
I'm not saying migrate to another facility but to another machine. If that's what you also meant would you be able to provide a source? That seems like a very very big oversite.
But if they are patching xen and xen supports live migration on at least some hosts (At least RHEL can).... Kind of makes you wonder what's the problem.
"we will be re-booting the cloud today,,,in order to protect your 3,2 petabytes of data, you should download it to local storage in case of a fail event. thanks for using cloud storage on computing. have a great day."
never bring a twinkie to a food fight.
That seems like a very very big oversite.
It's nature of the beast. Live migrations without shared storage are really not commonplace. Amazon does not bother with shared storage and thus cannot live migrate. Even if they did have the ability to live migrate with no shared storage, the time to live migrate such a workload would be impractical.
In short, EC2 strives for cheap and no migration is part of 'cheap'.
XML is like violence. If it doesn't solve the problem, use more.
How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
AWS can't live migrate VM's.
Xen can.
Well, actually, for about 100ms, the system isn't technically running, but the point is that you can bounce a VM from one host to another without rebooting it.
How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
AWS can't live migrate VM's.
Xen can.
Well, actually, for about 100ms, the system isn't technically running, but the point is that you can bounce a VM from one host to another without rebooting it.
Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
Does this mean the open source release of Xen doesn't have the diff applied? Do customers of large corporate clouds now have a security advantage over other users?
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
A lot of people want the convenience of a virtual server, but not the price tag or hassle of several servers and a load balancer. They don't "get" why they would pay for lots of small machines when one big one would do. Once you do convince them to go with several small servers and a load balancer, they don't understand why their FTP changes take a moment to show up online. Then they don't don't want to invest in someone to setup the system with puppet or ansible or the like... The list goes on, but it usually comes down to people not having the money or desire(usually both) to do things "the cloud way."
Most of these small players would be happier with a single 2-drive RAID-1 server in their closet, except they are too cheap to shell out for a decent machine in the first place as well as business tier internet (they usually don't have the traffic to warrant it, but is required for ISPs to be OK with it). $5/month for a VPS is much more palatable, even if what they get is a lot less powerful then they could have in their office.
There's no business tier small office internet that's going to give users the same uptime as a cheap VPS somewhere. No business that wants to maintain a 24x7 internet presence should be running their server on a small server in their closet.
Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
Amazon doesn't have the capacity to failover all the vm's to other hardware (maybe some but not all or big ones). Or they don't want to bother and force the work on to their customers.
I think you meant "and charge customers for the much larger infrastructure required". Amazon is cheap, and they are clear that what you're buying from them is just a bunch of machines. If you want reliability, use multiple AZ's and regions. Some of their VM's come with a TB or more of instance storage, that's a lot of data to live-migrate when they want to reboot a physical host machine.
If you want live migration, check out Google Compute Engine, but if availability is important to you, you're better off architecting multiple machine redundancy than relying on a single long-lived machine since there are a lot more things than host maintenance that can trigger a crash and/or reboot of a VM.
Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
VMWare's fault tolerance is decent, but nothing that will recover in milliseconds. Even with vMotion and HA, it will take some time for the machine to reboot.
Of course, there is the FT mode of VMWare... but it has a lot of limitations, such as only allowing 1 vCPU, but it does run two VMs in lockstep so if the heartbeat drops, the downtime is in seconds, not minutes as with a machine restarting.
Seriously, if you ran your own server, you think you would never have to reboot it?
Yes, the cloud will have downtime. Just like we sometimes have blackouts/brownouts from an electricity outage.
BUT, chances are that downtime is LESS than the downtime you'd have running things on your own.
In every company I've worked in, there have been days the internet goes down, some intranet app goes down, exchange goes down... things need to updated and are down for a few hours.
Netflix still cocks up randomly on a stream and forces retries. I suspect it's not as rosy as they like to say and that the random death of services is more disruptive than they notice or acknowledge.
Meanwhile, even with their 'kill stuff randomly' methodology, the wrong thing still dies ever so often and brings the whole thing to a screeching halt.
XML is like violence. If it doesn't solve the problem, use more.
AWS has been around long enough this shouldn't be an issue. If a given architecture cannot survive downtime from a server, or an availability zone, then the risk is no different than if the servers were in a locally-managed datacenter.
In short, if you don't take advantage of what the cloud has to offer in terms of redundancy, then don't expect zero downtime.
If you can migrate/clone between multiple AZ and region some will inevitably do that just to avoid a reboot (Simply human nature, the "just because they can" type, nothing really related to the technical aspects of said individuals). I would like to see how that turns out for those guys.
Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
Except that in a cloud, storage is part of the cloud, not part of the server. The only thing that has to physically move is the RAM image of the running VM from one host to another. And it's almost certainly going to be faster to replicate that than to destroy and rebuild it (reboot).
Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
Except that in a cloud, storage is part of the cloud, not part of the server. The only thing that has to physically move is the RAM image of the running VM from one host to another. And it's almost certainly going to be faster to replicate that than to destroy and rebuild it (reboot).
No, Amazon says that instance storage is directly attached to the host machine, so if they live-migrate a VM, they'd have to carry along the instance storage.
http://docs.aws.amazon.com/AWS...
Many Amazon EC2 instance types can access disk storage from disks that are physically attached to the host computer. This disk storage is referred to as instance store.
And there's no evidence that they use any type of shared SAN for instance storage -- instance storage only stays around for as long as the machine is running (or rebooted). If you stop the machine (as opposed to rebooting), or if Amazon has to migrate to a new physical host, you lose the instance store.
"we will be re-booting the cloud today,,,in order to protect your 3,2 petabytes of data, you should download it to local storage in case of a fail event. thanks for using cloud storage on computing. have a great day."
That this inane post is moderated as "3, Insightful" is why I do not visit /. anymore.
-- @rjamestaylor on Ello
I really don't get it, every virtualization technology has the possibility to live migrate the virtual machine to a different physical host, vmware, kvm, openvz, xen, everyone has it, for at least three of them you don't need to have shared storage. Why don't they use it?
-- If you can't convince them, confuse them (Truman)
Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives. It defeats the purpose if every compute node has a fixed amount of local disk space attached to it. There's no elasticity. Some compute nodes might max out, some might be using only a fraction of the drive. The whole reason for virtualizing everything was that there were too many machines burning up tons of resources while sitting more or less idle.
IIRC, Amazon's current instance storage model allows for magnetic or SSD storage, but I don't think they allocate in terms of actual physical drives.
Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives.
Who would you be contentious with? I'm just telling you what Amazon says in their published docs. If you don't believe what they say, or if you think they could do it better you can bring it up with them, or start your own cloud service that does things "right".
But I can tell you that some use cases are perfect for Amazon's model of providing locally attached instance storage since I/O rates are much better than we can get with EBS volumes.
Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives.
Who would you be contentious with? I'm just telling you what Amazon says in their published docs. If you don't believe what they say, or if you think they could do it better you can bring it up with them, or start your own cloud service that does things "right".
But I can tell you that some use cases are perfect for Amazon's model of providing locally attached instance storage since I/O rates are much better than we can get with EBS volumes.
The days when just anyone could enter the market as an ISP are long since passed. The "back bedroom" ISP I started with has been through at least 4 layers of acquisition. I myself stopped providing hosting services before the millenium came. The economies of scale were not available to me and I don't have deep enough pockets - nor rich enough friends - to set up anything even remotely competitive.
So I'll settle for holding Amazon's feet to the fire.
I don't host anymore, but I do work with cloud services internally, so I know that systems such as Openstack operate in the way I mentioned. And Openstack is used by (and developed, in part by) some of Amazon's competition.
When I use my own machines, I can schedule downtime. Or, if it's critical enough, use the techniques I've mentioned to assure continuous uptime. When I outsource to some other hosting service, it's a significant thing if they can reboot me without warning.
Just migrate the instance to a host running the fixed version of Xen, reboot the host with the broken version when it's empty.
Or just share the drives over LAN. vSAN was it called?
I know tobacco is bad for you, so I smoke weed with crack.