Amazon Forced To Reboot EC2 To Patch Bug In Xen

← Back to Stories (view on slashdot.org)

Amazon Forced To Reboot EC2 To Patch Bug In Xen

Posted by timothy on Thursday September 25, 2014 @02:52AM from the failure-to-achieve-xen dept.

Bismillah writes AWS is currently emailing EC2 customers that it will need to reboot their instances for maintenance over the next few days. The email doesn't explain why the reboots are being done, but it is most likely to patch for the embargoed XSA-108 bug in Xen. ZDNet takes this as a spur to remind everyone that the cloud is not magical. Also at The Register.

94 comments

Min score:

Reason:

Sort:

Re:Sort of off topic by Anonymous Coward · 2014-09-25 03:11 · Score: 0

More than sort of -1 for time wasting even an ignorant 'first post yay' would have seemed more relevant
Compared to Azure by Anonymous Coward · 2014-09-25 03:11 · Score: 3, Informative

It's funny for me to read that Amazon is notifying its users of an impending reboot.
I've been suffering with Azure for over a year now, and the only thing that's constant is rebooting....
My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.
Let that sink in for a while. You are actually required to program your application to expect failed database calls.
I've never seen such a horrible platform, or a less reliable database server...
1. Re:Compared to Azure by CodeReign · 2014-09-25 03:20 · Score: 5, Insightful
  
  You are actually required to program your application to expect failed database calls.
  Yes, of course you are. Only an idiot would expect 100% of db calls to be successful.
2. Re:Compared to Azure by Anonymous Coward · 2014-09-25 03:26 · Score: 1
  
  You are actually required to program your application to expect failed database calls.
  Shouldn't you be doing that anyway? Handling failed DB calls sounds like a best practice to me.
  I get it being annoying that it drops DB connections but decent code should be able to handle that.
  I do sympathize in a way though, I hate when the language/platform/etc... forces me to do something "their" way. Java says everything must be object, fuck you Java not everything needs to be/should be a object!
3. Re:Compared to Azure by Anonymous Coward · 2014-09-25 03:32 · Score: 0
  
  >My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.
  This sounds like someone at Microsoft saw Chaos Monkey and decided to build it into the platform. At any rate, it's not like there is such a thing as a reliable database or network, so your application most certainly should be able to handle not being able to access it's database. This sounds like an actually good idea.
4. Re:Compared to Azure by Anonymous Coward · 2014-09-25 03:33 · Score: 0
  
  Well, considering that this is the first time in any application that I've had anything less than a 100% success rate of database calls, perhaps its a shitty platform...
  I mean how hard is this - connect to database, run query, return results... worked find for years.
  But now I guess we have to retry every call 5 times cuz our Azure database server just chooses to drop connections.
5. Re:Compared to Azure by bad-badtz-maru · 2014-09-25 03:35 · Score: 2
  
  You can "handle" a dropped connection, but if you're in a transaction in the middle of updating data, it's probably not going to be transparent to the user.
  -E
6. Re:Compared to Azure by Junta · 2014-09-25 03:35 · Score: 1
  
  My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.
  I have seen that mentality in a few places beyond azure, I find it moderately annoying. I guess the theory is assuring that *some* failure will happen to you soon even if you don't properly test so you don't go too long without failure and get surprised. However it tends to lead to stacks that occasionally spaz out for a particular user and accepting that as ok because the user can just retry.
  
  You are actually required to program your application to expect failed database calls.
  On the other hand, you should always design your application to expect failed database calls. There might be some regrettable performance or unavoidable awkwardness in some cases around a failed database call (making it rude to randomly drop needlessly), but such an occurrence is to be expected at least occasionally no matter the stack.
  
  --
  XML is like violence. If it doesn't solve the problem, use more.
7. Re:Compared to Azure by iggymanz · 2014-09-25 03:40 · Score: 1
  
  hahaha
  so you've never worked on serious computer systems? The mainframe and vms clusters I've used had databases working for years (over a decade in one case as new hardware joined sequentially to cluster as old retired).
  failures very occasional, to say the least
  even where I am now the main database is oracle on virtualized linux servers, it's been up for 3+ years
  Not everything is apache server hooking to single mysql instance....
8. Re:Compared to Azure by Anonymous Coward · 2014-09-25 03:41 · Score: 0
  
  Let that sink in for a while. You are actually required to program your application to expect failed database calls.
  I have no idea why anyone would rely on Azure. Microsoft has shown time and time again that they are not skilled at running large-scale systems. Just look at the No-IP screwup for a recent example.
9. Re:Compared to Azure by Aaden42 · 2014-09-25 03:42 · Score: 4, Insightful
  
  Be sure to thank Microsoft for teaching you the value of robust error checking. Assume any other host you need to talk to was nuked from orbit five seconds ago. Write your code to bounce back from that to the degree possible.
  At the very least, DB *connections* should be assumed to have evaporated since the last time you accessed them. Use some sort of pooling library that can deal with that transparently if you like, or just catch & retry if necessary.
  Seriously though, sounds like the environments you’ve worked in have been simple enough with low enough transaction volume that you got lucky & everything just worked. DB & app server on the same box maybe? Dealing with temporarily unavailable external hosts is just part of writing multi-tier code.
10. Re:Compared to Azure by Shados · 2014-09-25 03:47 · Score: 4, Insightful
  
  if you're in an transaction and it fails, you can just redo it. Thats the whole damn point.
11. Re:Compared to Azure by Shados · 2014-09-25 03:49 · Score: 2
  
  We're not talking about the thing going down here, just database connection sometimes failing. If you have a 100% failure proof network and you can replicate it, go tell Google, Amazon, etc. They have a job for you.
12. Re:Compared to Azure by Yebyen · 2014-09-25 03:49 · Score: 1
  
  So what you're saying is, there were occasional failures.
  
  --
  Restating the obvious since nineteen aught five.
13. Re:Compared to Azure by Anonymous Coward · 2014-09-25 03:52 · Score: 0
  
  In serious systems serious people make realistic assumptions like knowing that sometimes things fail and need to retried. That's actually why your database has been up for 3 years.
14. Re:Compared to Azure by SuiteSisterMary · 2014-09-25 03:56 · Score: 1
  
  You are actually required to program your application to expect failed database calls.
  The problem with programming is that most programmers consider basic error handling and sanity checking to be optional.
  
  --
  Vintage computer games and RPG books available. Email me if you're interested.
15. Re:Compared to Azure by RabidReindeer · 2014-09-25 03:58 · Score: 1
  
  No, some things are load-balanced banks of apache servers hooked to Galera MySQL clusters.
  Really, though. Unless Oracle has been spending a LOT more time on version compatibility than IBM or PostgreSQL, I have to wonder if those 3+ years don't mean that the database is something like 9i still running. And Oracle DEFINITELY knows how to break things in their Financials product from major release to major release.
16. Re:Compared to Azure by bmimatt · 2014-09-25 04:02 · Score: 1
  
  When hosting your app in the cloud, regardless of provider, it is considered best practice to design for failure. That means your code should anticipate any/all stack layers to become unavailable. If you're doing it right, a service failure should be detected and automatic failover executed. Alternatively, a new instance should be provisioned, bootstrapped and thrown into production. Think: infrastructure as code. Welcome to the 21-st century.
17. Re:Compared to Azure by Anonymous Coward · 2014-09-25 04:17 · Score: 1
  
  It might be a shitty platform. but you are a shitty programmer. And it does not matter if it drops every time or never - your code deals with the problem every time it arises. are you retarded or in HR?
18. Re:Compared to Azure by tlhIngan · 2014-09-25 04:24 · Score: 1
  
  And unless you run a small website, that can happen way too easily.
  Every e-commerce site has database failures usually around peak shopping periods - it's usually the weakest point because no matter how many instances you run, it's the bottleneck as the database's view of the has to be consistent across all database servers.
  And sometimes, well, the sheer crunch of users buying stuff topples that.
  Even a good /.'ing in the past would return errors of the form "Could not connect to database".
  Anyhow, I thought one point of the cloud was it was separated from hardware - if you need to reboot the host machine the servers were transparently moved to another machine while the host resets. The actual details or even which machine the cloud instance runs on is a detail that's not required in order to use it. As long as the guest OS is OK, it doesn't matter what piece of actual hardware it's running on.
19. Re:Compared to Azure by Just+Some+Guy · 2014-09-25 04:26 · Score: 1
  
  so you've never worked on vertical computer systems?
  Fixed that for you. You're conflating vertically scaled monoliths with "serious systems". That's quaint. While there are certainly still use cases for that kind of bulletproof all-your-eggs-in-one-basket architecture, that's a niche compared to the number of applications where horizontally scaled eventually consistent architecture is more appropriate.
  
  The mainframe and vms clusters I've used had databases working for years (over a decade in one case as new hardware joined sequentially to cluster as old retired).
  Undoubtedly, and the distributed clusters I've used where you can make progress as long as at least some reasonable subset of nodes are still alive have similar uptimes. When was the last time you heard about Google being completely dead in the water? Their software was written with the expectation that failures happen (and a lot at their scale) so that clients need to intelligently reconnect to unresponsive servers, etc. That design seems to be working out pretty well for them.
  
  --
  Dewey, what part of this looks like authorities should be involved?
20. Re:Compared to Azure by Just+Some+Guy · 2014-09-25 04:30 · Score: 3, Informative
  
  When hosting your app in the cloud, regardless of provider, it is considered best practice to design for failure.
  Netflix goes so far as to randomly kill services throughout the day. Their idea is that it's better to find systems that aren't auto-healing correctly by testing recovery during routine operations than to be surprised by it at 3AM. It's successful to the point that you generally don't know that the streaming server you were connected to has been killed and a peer took over for it. That is how you make reliable cloud services.
  
  --
  Dewey, what part of this looks like authorities should be involved?
21. Re:Compared to Azure by Anonymous Coward · 2014-09-25 04:38 · Score: 0
  
  Copy/pasting from stackoverflow? Is "5 times" a magic number you came up with?
  Fucking faggoty ass moron.
22. Re:Compared to Azure by sabri · 2014-09-25 04:43 · Score: 1
  
  It might be a shitty platform. but you are a shitty programmer.
  Apply water to burned area.
  
  --
  I'm not a complete idiot... Some parts are missing.
23. Re:Compared to Azure by Anonymous Coward · 2014-09-25 04:52 · Score: 0
  
  Microsoft didn't add retry logic to their DB entity framework until well after Sql Azure started intentionally dropping tons of database connections.
24. Re:Compared to Azure by Anonymous Coward · 2014-09-25 05:21 · Score: 0
  
  But you see, Azure is magical.
25. Re:Compared to Azure by CodeReign · 2014-09-25 05:27 · Score: 1
  
  I mean I'm an Oracle FMW developer working with several Oracle servers having serious uptime and SLAs but even then, hiccups happen. A good developer programs with the expectation that not everything will work smoothly and so long as not everything breaks at once I could have a DB fall off the face of the earth or a server get shot and we'd still chug along with minimal perceived downtime.
26. Re:Compared to Azure by Anonymous Coward · 2014-09-25 06:23 · Score: 0
  
  SQL Azure randomly drops database connections by design.
  [citation needed]
27. Re:Compared to Azure by Anonymous Coward · 2014-09-25 06:29 · Score: 0
  
  And then it fails again because there's something wrong with the in-data/procedures. Now what?
28. Re:Compared to Azure by Anonymous Coward · 2014-09-25 06:31 · Score: 1
  
  I do sympathize in a way though, I hate when the language/platform/etc... forces me to do something "their" way. Java says everything must be object, fuck you Java not everything needs to be/should be a object!
  What does Java have to do with this? My guess is the OP is using Python or some other script-kiddie language. You don't want to use objects in Java? Here ya go:
  byte[] myManagedMemory = new byte[1024 * 1024 * 1024 * 2] // 2GB RAM
  I wont hold my breath to see how your memory management is going to be better than what the JVM already provides.
  BTW: primitives are not Objects in Java.
29. Re:Compared to Azure by Bert64 · 2014-09-25 07:23 · Score: 1
  
  Which makes it an excellent dev environment, but terrible for production use...
  While you want your code to be able to cope with database instability, when that code goes into production you also want to minimise the chances that it will ever have to.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
30. Re:Compared to Azure by Anonymous Coward · 2014-09-25 10:36 · Score: 0
  
  methinks you don't understand what database transactions are. Either all the database commits go in, or none (atomic). If the connection drops, you reconnect and retry the SQL update(s). If the SQL fails because of some other issue (like a unique constraint violation, etc), well you wouldn't "retry" that, but let the error handler that deals with that case handle it.
  However, the GP has a misunderstandings that 'transactions' solve all the problems -- there is always that tiny window where the database updates are committed by the backend DB-engine, and THEN the connection drops. You code thinks that the transaction wasn't committed (because you get a connection-dropped error), but the engine did commit the updates already. Had this exact problem happen in the product I worked on, very hard to protect all SQL updates to handle that timing issue.
31. Re:Compared to Azure by cheater512 · 2014-09-25 11:32 · Score: 1
  
  My DB servers have a 0.07% failure rate. I imagine the parent is seeing a far higher percentage than that.
32. Re: Compared to Azure by Anonymous Coward · 2014-09-25 16:50 · Score: 0
  
  Back to VB 6 for you... You better hope your brother's Uncle-in-law is still hiring...
33. Re:Compared to Azure by bad-badtz-maru · 2014-09-26 01:41 · Score: 1
  
  That isn't the point of a transaction _at all_. The point is to ensure that all of the operations contained in the transaction block are atomic - they either all happen or none of them happen.
34. Re:Compared to Azure by Shados · 2014-09-26 02:01 · Score: 1
  
  Exactly. Either all happen OR NONE HAPPEN.
  OR NONE HAPPEN.
  OR NONE HAPPEN.
  OR NONE HAPPEN.
  Which means you can simply rerun any failed transaction safely.
  Thanks for repeating what I said in different words.
35. Re:Compared to Azure by bad-badtz-maru · 2014-09-26 05:16 · Score: 1
  
  You said that the point of a transaction is to enable the ability to retry it. I said that was not the point. I don't see how we are saying the same thing.
36. Re:Compared to Azure by Anonymous Coward · 2014-09-26 19:22 · Score: 0
  
  It seems to me that you are intentionally (or unintentionally) misunderstanding your parent poster just because you like to argue a minor semantic detail.
  Let's say that a nice consequence of a transaction being rolled back in face of a temporary error is that you are able to then retry it if you want.
  Better?
  Now start working on your people and communication skills. You need it. Really.
37. Re:Compared to Azure by iggymanz · 2014-09-28 04:25 · Score: 1
  
  You are confused, the architecture of google is utterly useless for most businesses cases, it does not and can not provide accurate answers to queries.
38. Re:Compared to Azure by Just+Some+Guy · 2014-09-28 05:12 · Score: 2
  
  The architecture of Google is utterly useless for many businesses cases.There are many use cases where it'd be perfectly appropriate.
  
  it does not and can not provide accurate answers to queries.
  In most cases, businesses don't really care about accurate answers to queries; they want quick, more-or-less correct answers. For example, suppose Amazon has a dashboard that shows their book sales on an hourly basis. Timeliness is more important than exactness here, and answers more precise than the pixel resolution of the graph on the big TV are wasted. A "big data" style query that is 99% correct and runs in 5 seconds is much more valuable here than the exact answer that returns in 2 hours.
  For accounting types of reporting, slow, exact architectures are probably more appropriate. For realtime analytics, a best guess that comes back immediately may be the right thing.
  
  --
  Dewey, what part of this looks like authorities should be involved?
39. Re:Compared to Azure by bad-badtz-maru · 2014-09-29 07:38 · Score: 1
  
  You are right, I am failing to adequately communicate what I am saying. That a transaction can be retried is a byproduct of the atomicity requirement that a transaction fills. Retrying a transaction, because there is an ongoing problem with the database system dropping connections, is a sloppy hack.
migratable vms? by Anonymous Coward · 2014-09-25 03:13 · Score: 0

How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
1. Re:migratable vms? by hawguy · 2014-09-25 03:16 · Score: 2
  
  How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
  AWS can't live migrate VM's.
2. Re:migratable vms? by thieh · 2014-09-25 03:19 · Score: 1
  
  I was just wondering why not live migrate the VM to a patched rig and then reboot the unpatched rig. Guess that answers the question.
3. Re:migratable vms? by Virtucon · 2014-09-25 03:21 · Score: 1
  
  that's Xen and Xen != VMWare but it works for about 99% of the workloads out there. From what I received it's a cold restart not a simple reboot. Periodically they upgrade hardware/software and once a month we go through a cold restart an all of our AWS instances. It's easy with the right tools.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
4. Re:migratable vms? by CodeReign · 2014-09-25 03:21 · Score: 1
  
  I'm not saying migrate to another facility but to another machine. If that's what you also meant would you be able to provide a source? That seems like a very very big oversite.
5. Re:migratable vms? by Anonymous Coward · 2014-09-25 03:24 · Score: 0
  
  No, which is surprising in some ways. In other ways it shouldn't matter (you're supposed to build your Cloud infrastructure to be n+m redundant, where m is > 0). If you've done it right, you'll never notice as individual instances come and go.
  
  On the flipside a lot of people are Doing It Wrong and will be upset by a reboot, which is why OpenStack is working to add things like Live Migration.
6. Re:migratable vms? by thieh · 2014-09-25 03:29 · Score: 1
  
  But if they are patching xen and xen supports live migration on at least some hosts (At least RHEL can).... Kind of makes you wonder what's the problem.
7. Re:migratable vms? by Anonymous Coward · 2014-09-25 03:42 · Score: 0
  
  Amazon doesn't have the capacity to failover all the vm's to other hardware (maybe some but not all or big ones). Or they don't want to bother and force the work on to their customers.
8. Re:migratable vms? by Junta · 2014-09-25 03:47 · Score: 1
  
  That seems like a very very big oversite.
  It's nature of the beast. Live migrations without shared storage are really not commonplace. Amazon does not bother with shared storage and thus cannot live migrate. Even if they did have the ability to live migrate with no shared storage, the time to live migrate such a workload would be impractical.
  In short, EC2 strives for cheap and no migration is part of 'cheap'.
  
  --
  XML is like violence. If it doesn't solve the problem, use more.
9. Re:migratable vms? by RabidReindeer · 2014-09-25 03:54 · Score: 1
  
  How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
  AWS can't live migrate VM's.
  Xen can.
  Well, actually, for about 100ms, the system isn't technically running, but the point is that you can bounce a VM from one host to another without rebooting it.
10. Re:migratable vms? by Anonymous Coward · 2014-09-25 04:00 · Score: 0
  
  A lot of people want the convenience of a virtual server, but not the price tag or hassle of several servers and a load balancer. They don't "get" why they would pay for lots of small machines when one big one would do. Once you do convince them to go with several small servers and a load balancer, they don't understand why their FTP changes take a moment to show up online. Then they don't don't want to invest in someone to setup the system with puppet or ansible or the like... The list goes on, but it usually comes down to people not having the money or desire(usually both) to do things "the cloud way."
  Most of these small players would be happier with a single 2-drive RAID-1 server in their closet, except they are too cheap to shell out for a decent machine in the first place as well as business tier internet (they usually don't have the traffic to warrant it, but is required for ISPs to be OK with it). $5/month for a VPS is much more palatable, even if what they get is a lot less powerful then they could have in their office.
11. Re:migratable vms? by hawguy · 2014-09-25 04:16 · Score: 1
  
  How much longer would it take to migrate the existing vms to patched version. (even if you only have 10% unutilized resources it'd only take at most nine swaps) I agree it's a bad solution to move every machine over night but it's better than forcing an outage.
  AWS can't live migrate VM's.
  Xen can.
  Well, actually, for about 100ms, the system isn't technically running, but the point is that you can bounce a VM from one host to another without rebooting it.
  Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
  They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
12. Re:migratable vms? by hawguy · 2014-09-25 04:21 · Score: 1
  
  A lot of people want the convenience of a virtual server, but not the price tag or hassle of several servers and a load balancer. They don't "get" why they would pay for lots of small machines when one big one would do. Once you do convince them to go with several small servers and a load balancer, they don't understand why their FTP changes take a moment to show up online. Then they don't don't want to invest in someone to setup the system with puppet or ansible or the like... The list goes on, but it usually comes down to people not having the money or desire(usually both) to do things "the cloud way."
  Most of these small players would be happier with a single 2-drive RAID-1 server in their closet, except they are too cheap to shell out for a decent machine in the first place as well as business tier internet (they usually don't have the traffic to warrant it, but is required for ISPs to be OK with it). $5/month for a VPS is much more palatable, even if what they get is a lot less powerful then they could have in their office.
  There's no business tier small office internet that's going to give users the same uptime as a cheap VPS somewhere. No business that wants to maintain a 24x7 internet presence should be running their server on a small server in their closet.
13. Re:migratable vms? by RabidReindeer · 2014-09-25 04:27 · Score: 1
  
  Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
  They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
  Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
14. Re:migratable vms? by hawguy · 2014-09-25 04:28 · Score: 1
  
  Amazon doesn't have the capacity to failover all the vm's to other hardware (maybe some but not all or big ones). Or they don't want to bother and force the work on to their customers.
  I think you meant "and charge customers for the much larger infrastructure required". Amazon is cheap, and they are clear that what you're buying from them is just a bunch of machines. If you want reliability, use multiple AZ's and regions. Some of their VM's come with a TB or more of instance storage, that's a lot of data to live-migrate when they want to reboot a physical host machine.
  If you want live migration, check out Google Compute Engine, but if availability is important to you, you're better off architecting multiple machine redundancy than relying on a single long-lived machine since there are a lot more things than host maintenance that can trigger a crash and/or reboot of a VM.
15. Re:migratable vms? by hawguy · 2014-09-25 04:32 · Score: 1
  
  Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
  They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
  Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
  I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
16. Re:migratable vms? by thieh · 2014-09-25 06:17 · Score: 1
  
  If you can migrate/clone between multiple AZ and region some will inevitably do that just to avoid a reboot (Simply human nature, the "just because they can" type, nothing really related to the technical aspects of said individuals). I would like to see how that turns out for those guys.
17. Re:migratable vms? by RabidReindeer · 2014-09-25 06:34 · Score: 1
  
  Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
  They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
  Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
  I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
  Except that in a cloud, storage is part of the cloud, not part of the server. The only thing that has to physically move is the RAM image of the running VM from one host to another. And it's almost certainly going to be faster to replicate that than to destroy and rebuild it (reboot).
18. Re:migratable vms? by hawguy · 2014-09-25 06:48 · Score: 1
  
  Xen is software, not AWS, AWS is an entire infrastructure, and they can not (or will not) live migrate customer VM's.
  They are very clear in their documentation that customers should be able to tolerate VM restarts and to use multiple AZ's and regions to help mitigate downtime. I have several hundred instances scheduled for reboot, but they are doing one AZ at a time.
  Since Xen is rumored to be the VM host for AWS (or at least large parts of it), I'd have to think it's "will not".
  I can believe it's "can not", since amazon provides gigabytes (or terabytes) of local instance storage for most of their instance types - that's a lot of data to live migrate. Even if the underlying Xen software technically *can* live migrate VM's, that doesn't mean their infrastructure can support migrating thousands of customer instances.
  Except that in a cloud, storage is part of the cloud, not part of the server. The only thing that has to physically move is the RAM image of the running VM from one host to another. And it's almost certainly going to be faster to replicate that than to destroy and rebuild it (reboot).
  No, Amazon says that instance storage is directly attached to the host machine, so if they live-migrate a VM, they'd have to carry along the instance storage.
  
  http://docs.aws.amazon.com/AWS...
  Many Amazon EC2 instance types can access disk storage from disks that are physically attached to the host computer. This disk storage is referred to as instance store.
  And there's no evidence that they use any type of shared SAN for instance storage -- instance storage only stays around for as long as the machine is running (or rebooted). If you stop the machine (as opposed to rebooting), or if Amazon has to migrate to a new physical host, you lose the instance store.
19. Re:migratable vms? by RabidReindeer · 2014-09-25 08:57 · Score: 1
  
  Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives. It defeats the purpose if every compute node has a fixed amount of local disk space attached to it. There's no elasticity. Some compute nodes might max out, some might be using only a fraction of the drive. The whole reason for virtualizing everything was that there were too many machines burning up tons of resources while sitting more or less idle.
  IIRC, Amazon's current instance storage model allows for magnetic or SSD storage, but I don't think they allocate in terms of actual physical drives.
20. Re:migratable vms? by hawguy · 2014-09-25 09:21 · Score: 1
  
  Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives.
  Who would you be contentious with? I'm just telling you what Amazon says in their published docs. If you don't believe what they say, or if you think they could do it better you can bring it up with them, or start your own cloud service that does things "right".
  But I can tell you that some use cases are perfect for Amazon's model of providing locally attached instance storage since I/O rates are much better than we can get with EBS volumes.
21. Re:migratable vms? by RabidReindeer · 2014-09-25 23:31 · Score: 1
  
  Not trying to be contentious here, but if you wanted optimal resource usage, you'd be looking more at blade-style compute nodes with no local drives.
  Who would you be contentious with? I'm just telling you what Amazon says in their published docs. If you don't believe what they say, or if you think they could do it better you can bring it up with them, or start your own cloud service that does things "right".
  But I can tell you that some use cases are perfect for Amazon's model of providing locally attached instance storage since I/O rates are much better than we can get with EBS volumes.
  The days when just anyone could enter the market as an ISP are long since passed. The "back bedroom" ISP I started with has been through at least 4 layers of acquisition. I myself stopped providing hosting services before the millenium came. The economies of scale were not available to me and I don't have deep enough pockets - nor rich enough friends - to set up anything even remotely competitive.
  So I'll settle for holding Amazon's feet to the fire.
  I don't host anymore, but I do work with cloud services internally, so I know that systems such as Openstack operate in the way I mentioned. And Openstack is used by (and developed, in part by) some of Amazon's competition.
  When I use my own machines, I can schedule downtime. Or, if it's critical enough, use the techniques I've mentioned to assure continuous uptime. When I outsource to some other hosting service, it's a significant thing if they can reboot me without warning.
22. Re:migratable vms? by badkarmadayaccount · 2014-09-28 10:35 · Score: 1
  
  Or just share the drives over LAN. vSAN was it called?
  
  --
  I know tobacco is bad for you, so I smoke weed with crack.
Reboot by Anonymous Coward · 2014-09-25 03:13 · Score: 1

If your design has issues with instances going up & down, you're doing it wrong and shouldn't be using cloud services to begin with.
Re:Sort of off topic by hawguy · 2014-09-25 03:15 · Score: 0, Offtopic

I saw in the Mpls Star Tribune the other day that Amazon are going to start charging (MN residents) sales tax as from 1st October.
I don't know if this will apply to digital content as well but if it does then I will have to cut back on buying books, magazines, and music from them as well.
The only stuff we will be able to buy is clothes...
If Amazon is collecting sales tax, it means that you were supposed to have already been paying the sales tax, and you're practicing tax evasion if you haven't paying sales or use tax on your purchases.
http://www.revenue.state.mn.us...
Buy Cheap Die Cheap by Anonymous Coward · 2014-09-25 03:26 · Score: 0

Cardboard coffins for all!
email from ec2... by Connie_Lingus · 2014-09-25 03:38 · Score: 2, Insightful

"we will be re-booting the cloud today,,,in order to protect your 3,2 petabytes of data, you should download it to local storage in case of a fail event. thanks for using cloud storage on computing. have a great day."

--
never bring a twinkie to a food fight.
1. Re:email from ec2... by Anonymous Coward · 2014-09-25 04:24 · Score: 0
  
  "Please note thatn $106,000 will be added to your next bill for the egress bandwidth."
2. Re:email from ec2... by MMC+Monster · 2014-09-25 08:31 · Score: 1
  
  "Please note thatn $106,000 will be added to your next bill for the egress bandwidth."
  Interesting thought: Cloud providers should offer a low-cost option for getting your petabyte of data out of their system.
  Perhaps mailing you a drive (or series of drives) with the data on it? If they allow you to run zfs or other filesystem with snapshot capability, create a snapshot and request that it be mailed to you. Maybe they'll even link in the available drives that can handle the data and you pick which one(s) you want.
  
  --
  Help! I'm a slashdot refugee.
3. Re:email from ec2... by Anonymous Coward · 2014-09-25 09:59 · Score: 0
  
  You're probably looking for AWS Import/Export. Or if not that, then AWS Storage Gateway.
4. Re:email from ec2... by Anonymous Coward · 2014-09-25 13:58 · Score: 0
  
  I think you missed the part where he said Petabytes instead of terabytes. 1PB =1000 TB. 3.2PB = 3200 6TB hard drives. At $290 each, that's $154,860 plus shipping.
Patch ssh too! by Anonymous Coward · 2014-09-25 03:46 · Score: 0

Do you actually trust your business to run in a remote cloud where someone else controls the first layer of security ? You crack that layer and you have the keys to the kingdom. And that's where some businesses are moving too. LMAO.. Foder
Re:Sort of off topic by NatasRevol · 2014-09-25 04:07 · Score: 0

AKA Amazon's business model.

--
There are two types of people in the world: Those who crave closure
Not a problem for properly architected sites by Anonymous Coward · 2014-09-25 04:11 · Score: 0

You've got a really bad architecture if rebooting a VM is a problem for you. Assume your VM's will go down at the worst possible moment, and plan accordingly. See also: Load Balancing and Clustering.
Re:Sort of off topic by Anonymous Coward · 2014-09-25 04:12 · Score: 0

It is also horribly unfair to have to pay the real price incl. tax. Society probably does it to punish him personally.
The cloud isn't magical? But... but... by Anonymous Coward · 2014-09-25 04:14 · Score: 0

... but what about this?
Emabargoed Bug? by bill_mcgonigle · 2014-09-25 04:18 · Score: 1

Does this mean the open source release of Xen doesn't have the diff applied? Do customers of large corporate clouds now have a security advantage over other users?

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:vmware by mlts · 2014-09-25 05:05 · Score: 1

VMWare's fault tolerance is decent, but nothing that will recover in milliseconds. Even with vMotion and HA, it will take some time for the machine to reboot.
Of course, there is the FT mode of VMWare... but it has a lot of limitations, such as only allowing 1 vCPU, but it does run two VMs in lockstep so if the heartbeat drops, the downtime is in seconds, not minutes as with a machine restarting.
The Cloud .... like ... magical by Anonymous Coward · 2014-09-25 05:28 · Score: 0

Oh the horror, having to suffer a reboot. Of course you wrote your apps to cope with this, you split them across zones to mitigate any effect, then you sat back waiting for nothing exciting to happen. A bit peeved due to short notice. Its the cloud, its what you do.
In the old days, patching was denied by my IT Manager due to downtime hassle and the fact that the devs assumed machines were magical and would run forever.
Thank god I don't need to suffer wankers like that old manager, and have Amazon suffering that hassle instead.
It is still magical enough by scamper_22 · 2014-09-25 05:56 · Score: 1

Seriously, if you ran your own server, you think you would never have to reboot it?
Yes, the cloud will have downtime. Just like we sometimes have blackouts/brownouts from an electricity outage.
BUT, chances are that downtime is LESS than the downtime you'd have running things on your own.
In every company I've worked in, there have been days the internet goes down, some intranet app goes down, exchange goes down... things need to updated and are down for a few hours.
Netflix is not perfect... by Junta · 2014-09-25 05:56 · Score: 1

Netflix still cocks up randomly on a stream and forces retries. I suspect it's not as rosy as they like to say and that the random death of services is more disruptive than they notice or acknowledge.
Meanwhile, even with their 'kill stuff randomly' methodology, the wrong thing still dies ever so often and brings the whole thing to a screeching halt.

--
XML is like violence. If it doesn't solve the problem, use more.
1. Re:Netflix is not perfect... by Just+Some+Guy · 2014-09-25 06:39 · Score: 1
  
  Netflix certainly isn't perfect, but they're Pretty Darn Good (tm). I haven't experienced any more glitches with streaming Netflix than I have with Comcast breaking other downloads.
  
  Meanwhile, even with their 'kill stuff randomly' methodology, the wrong thing still dies ever so often and brings the whole thing to a screeching halt.
  The whole idea behind Chaos Monkey is to make sure there's no such "the wrong thing" single point of failure. Having talked to their SREs, I think such outages are exceedingly rare.
  
  --
  Dewey, what part of this looks like authorities should be involved?
Why is this a story? by t'mbert · 2014-09-25 06:04 · Score: 1

AWS has been around long enough this shouldn't be an issue. If a given architecture cannot survive downtime from a server, or an availability zone, then the risk is no different than if the servers were in a locally-managed datacenter.
In short, if you don't take advantage of what the cloud has to offer in terms of redundancy, then don't expect zero downtime.
email from ec2... by rjamestaylor · 2014-09-25 08:19 · Score: 1, Funny

"we will be re-booting the cloud today,,,in order to protect your 3,2 petabytes of data, you should download it to local storage in case of a fail event. thanks for using cloud storage on computing. have a great day."
That this inane post is moderated as "3, Insightful" is why I do not visit /. anymore.

--
-- @rjamestaylor on Ello
I don't get it by amon · 2014-09-25 08:37 · Score: 2

I really don't get it, every virtualization technology has the possibility to live migrate the virtual machine to a different physical host, vmware, kvm, openvz, xen, everyone has it, for at least three of them you don't need to have shared storage. Why don't they use it?

--
-- If you can't convince them, confuse them (Truman)
Re:Sort of off topic by Anonymous Coward · 2014-09-25 08:59 · Score: 0

They aren't allowed to charge you sales tax on interstate commerce. And the use tax is discriminatory (not charged to in-state purchases) so it's illegal as well.
Re:Sort of off topic by Anonymous Coward · 2014-09-25 11:07 · Score: 0

They aren't allowed to charge you sales tax on interstate commerce. And the use tax is discriminatory (not charged to in-state purchases) so it's illegal as well.
Wrong. You are importing goods into your state, hence your state can absolutely tax your imports.
When you buy something "in-state", the retailer is supposed to collect the local/state SALES-TAX from you, and remit to the state's dept of revenue.
When you buy something "out-of-state" (ie: via an online retailer like Amazon) and import goods into your state, then the purchaser (you) is supposed to remit the USE-TAX instead. The Use-Tax rate is typically the same as the Sales-Tax rate.
So the GP is correct, the GGP was supposed to have been paying use-taxes this entire time to Minnesota, for all his Amazon purchases. The GGP a tax cheat (due to ignorance instead of malice I assume).
Citation: Minnesota Individual Use Tax

The same applies if you buy taxable items through mail-order catalogs or the Internet and Minnesota sales tax is not charged on the purchase.
Why do they need a reboot? by AlterEager · 2014-09-26 02:00 · Score: 1

Just migrate the instance to a host running the fixed version of Xen, reboot the host with the broken version when it's empty.
1. Re:Why do they need a reboot? by AlterEager · 2014-09-26 02:02 · Score: 1
  
  Oh:
  
  Given that what’s underlying EC2 are ordinary physical servers running virtualization without a live migration technology in use,
  EC2 doesn't do migration.
  Low-life.
Re:Sort of off topic by Anonymous Coward · 2014-09-26 05:19 · Score: 0

Yep. I've always reported the stuff I buy online on my state tax return. It's not a lot of money and it's peace of mind.