Researcher: Interdependencies Could Lead To Cloud 'Meltdowns'
alphadogg writes "As the use of cloud computing becomes more and more mainstream, serious operational 'meltdowns' could arise as end-users and vendors mix, match and bundle services for various means, a researcher argues in a new paper set for discussion next week at the USENIX HotCloud '12 conference in Boston. 'As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities or "meltdowns,"' Yale University researcher and assistant computer science professor Bryan Ford wrote in the paper. Ford compared this scenario to the intertwining, complex relationships and structures that helped contribute to the global financial crisis."
Nearly four months ago, I noticed that my internet connection was very sluggish. Eventually getting fed up with it, I began to seek out software that would speed up the gigabits in my router. After an hour of searching, I found what at first appeared to be a very promising piece of software. Not only did it claim it would speed up my internet connection, but that it would overclock my power supply, speed up my gigabits, and remove any viruses from my computer! "This is a fantastic opportunity that I simply can't pass up," I thought. I immediately downloaded the software and began the installation, all the while laughing like a small child. I was highly anticipating a future where the speed of my internet connection would leave everyone else's in the dust.
I was horribly, horribly naive. Immediately upon the completion of the software's installation, various messages popped up on my screen about how I needed to buy software to remove a virus that I wasn't aware I had from a software company I'd never once heard of. The strange software also blocked me from doing anything except buying the software it was advertising. Being that I was a computer whiz (I had taken a computer essentials class in high school that taught me how to use Microsoft Office, and was quite adept at accessing my Facebook account), I was immediately able to conclude that the software I'd downloaded was, in fact, a virus, and that it was slowing down my gigabits at an exponential rate. "I can't let this insanity proceed any further," I thought.
As I was often called a computer genius, I was confident at the time that I could get rid of the virus with my own two hands. I tried numerous things: restarting the computer, pressing random keys on the keyboard, throwing the mouse across the room, and even flipping an orange switch on the back of the tower and turning the computer back on. My efforts were all in vain; the virus persisted, and my gigabits were running slower than ever! "This cannot be! What is this!? I've never once seen such a vicious virus in my entire life!" I was dumbfounded that I, a computer genius, was unable to remove the virus using the methods I described. Upon coming to terms with my failure, I decided to take my computer to a PC repair shop for repair.
I drove to a nearby computer repair shop and entered the building with my computer in hand. The inside of the building was quite large, neat, and organized, and the employees all seemed very kind and knowledgeable. They laughed upon hearing my embarrassing story, and told me that they saw this kind of thing on a daily basis. They then accepted the job, and told me that in the worst case, it'd be fixed in three days from now. I left with a smile, and felt confident in my decision to leave the computer repairs to the experts.
A week later, they still hadn't called back. Visibly angry, I tried calling them countless times, but not a single time did they answer the phone. Their negligence and irresponsibility infuriated me, and sent me into a state of insanity that caused me to punch a gigantic hole in the wall. Being that I would require my computer for work soon, I decided to head over to the computer repair shop to find out exactly what the problem was.
Upon entering the building, I was shocked by the state of its interior; it looked as if a tornado had tore through the entire building! Countless broken computers were scattered all about the floor, desks were flipped over, the walls had holes in them, there was a puddle of blood on the floor, and worst of all, I saw that my computer was sitting in the middle of the room laying on its side! Absolutely unforgivable! I soon noticed one of the employees sitting behind one of the tipped over desks (the one that had previously had the cash register on top of it); he was shaking uncontrollably and sobbing. Despite being furious about my computer being tipped over, seeing him in that state still managed to make me less unforgiving. I decided to ask him what happened.
A few moments passed where the entire r
About eight months ago, I was searching around the internet to find out why my computer was running so slowly (it normally ran quite fast, but had gradually gotten slower over time). After a few minutes, I found a piece of software claiming that it could speed up my PC and make it run like new again. Being that I was dangerously ignorant about technology in general (even more so than I am now), I downloaded the software and began the installation. Mere moments after doing so, my desktop background image was changed and warnings that appeared to originate from Windows appeared all over the screen telling me to buy strange software from an unknown company in order to remove a virus it claimed I had.
I may have been ignorant about technology, but I wasn't that naive. I immediately concluded that the software I'd downloaded was, in fact, a virus. In my rage, I broke numerous objects, punched a hole in the wall, and cursed the world at the top of my lungs. I eventually calmed down, cleared my head, and realized that the only remedy for this problem was a carefully thought out plan. After a few moments of pondering about how to handle this situation, I decided that since I barely knew how to properly handle a computer, I should turn it over to the professionals and let them fix the issue.
Soon after making the decision, I drove to a local computer repair shop and entered the building with my computer in hand. They greeted me with a smile and stayed attentive the entire time that I was explaining the problem to them. They laughed as if they'd heard it all before, told me that I'm not the only one who has trouble operating computers, and then gave me a date for when the computer would be fixed. Not only had they told me that the computer would be completely repaired in at most two days, but the price for their services was surprisingly low, and to top it all off, they even gave me advice for how to avoid viruses in the future! I left the building feeling confident in my decision to seek professional help and satisfied knowing that such kind-hearted people were the ones doing the job.
The very next day, I received a phone call from the computer repair shop whilst I was at a local library researching computer viruses. I had stumbled upon a piece of software that appeared to be very promising, and I was about to do more research on it, but seeing as how I required my computer as soon as possible, I decided to put the matter on hold. Upon answering the phone and cheerfully greeting the person on the other end, I was greeted with a high-pitched shriek. Startled, I asked what was wrong. A few moments passed where nothing was said, and suddenly, the person on the other end said to me, in a low voice oozing with paranoia, "Come pick up your computer." They hung up immediately after saying that, and I couldn't help but notice that they sounded as if they were on the verge of tears. I briefly wondered if it was due to stress from work, and then drove to the computer repair shop to acquire my computer.
I was positively dismayed upon entering the building. The inside of the computer repair shop looked nothing like the image from my memories. There were broken computer parts scattered throughout the room, ceiling tiles all over the floor, blood splattered in every direction I looked, and even a human toe on the ground. After processing this disturbing information, I began panicking and frantically looking around for my computer. I spotted an employee covered in blood sitting up against the wall, and noticed that his wrists had been slashed open. Thinking quickly, I ran up to him, grabbed him by the collar of his shirt, shook him around, and began screaming, "Where is it!? Where is my computer!?" After a moment of silence, he passed away, completely shattering my expectations. Such a thing! "What a meaningless individual," I thought.
Enraged, I tore the building up even further than it already had been in my desperate search for my computer. Eventually I discovered a door leading to an area that was normally o
Around a year ago, I was mindlessly surfing the internet (as I often do) when I came across an enigmatic web page. The page, which looked like a warning from my web browser, informed me that I had a virus installed on my computer and that to fix it, I should install a strange anti-virus program that I'd never heard of (which I found peculiar considering the fact that I already had anti-virus software installed on my computer). Despite having reservations about installing it, I did so anyway (since it appeared to be a legitimate warning).
I cannot even fathom what I was thinking at that time. Soon after attempting to install the so-called anti-virus software, my desktop background image changed into a large red warning sign, warnings about malware began making appearances all over the screen, and a strange program I'd never seen before began nagging me to buy a program to remove the viruses. What should have been obvious previously then became clear to me: that software was a virus. Frustrated by my own stupidity, I began tossing objects around the room and cursing at no one in particular.
After I calmed down, I reluctantly took my computer to a local PC repair shop and steeled myself for the incoming fee. When I entered, I noticed that there were four men working there, and all of them seemed incredibly nice (the shop itself was clean and stylish, too). After I described the situation to them, they gave me a big smile (as if they'd seen and heard it all before), accepted the job, and told me that the computer would be working like new again in a few days. At the time, I was confident that their words held a great degree of truth to them.
The very next day, while I was using a local library's computer and browsing the internet, I came across a website dedicated to a certain piece of software. It claimed that it could fix up my PC and make it run like new again. I knew, right then, merely from viewing a single page on the website, that it was telling the truth. I cursed myself for not discovering this excellent piece of software before I had taken my PC to the PC repair shop. "It would've saved me money. Oh, well. I'm sure they'll get the job done just fine. I can always use this software in the future to conserve money." Those were my honest thoughts at the time.
Two days later, my phone rang after I returned home from work. I immediately was able to identify the number: it was the PC repair shop's phone number. Once I answered, something strange occurred; the one on the other end of the line spoke, in a small, tormented voice, "Return. Return. Return. Return. Return." No matter what I said to him, he would not stop repeating that one word. Unsettled by this odd occurrence, I traveled to the PC repair shop to find out exactly what happened.
Upon arriving inside the building, I looked upon the shop, which was a shadow of its former self, in shock. There were countless wires all over the floor, smashed computer parts scattered in every direction I looked, fallen shelves on the ground, desks flipped over on the ground, and, to make matters even worse, there was blood splattered all over the wall. Being the reasonable, upstanding, college-educated citizen that I was, I immediately concluded that the current state of the shop was due to none other than an employee's stress from work. I looked around a bit more, spotted three bodies sitting against the wall, and in the middle of the room, I spotted my computer. "Ah. There it is." Directly next to it was the shop's owner, sitting on the ground in the fetal position.
When I questioned him, he kept repeating a single thing again and again: "Cannot be stopped! Cannot be stopped! Cannot be stopped!" I could not get him to tell me what was wrong, but after a bit of pondering, I quickly figured out precisely what happened: they were unable to fix my computer like they had promised. Disgusted by their failure, I turned to the shop's owner (who I now noticed had a gun to his head), and spat in his general direction. I then turned my back to him as
I love how all the spam is hitting /.
If you have a critical service, have it at more than one host... That way when AWS has a bad hair day, you are still up.
Or, have your entire business totally dependent one someone else. (Sounds kinda scary that way, don't it?)
XKCD (jokingly) saw this coming a while ago: http://xkcd.com/908/
we live in an age where information is distributed, even if statistical. (hell I made a fake Facebook account and somehow they found my mom, and she is no where close to me) a meltdown of information can't happen unless there is a world wide melt down of power. we have backups, but also ways of statistically restoring those backups.
The analogy the author uses doesn't work.
A better analogy would be the airline industry. The airline industry likes to over-book airplane seats it may not have because it's always trying to optimize its profit-margin.
The same will happen with cloud-services. Cloud-services will always try to optimize their own profit-margins, at the risk of triggering significant outages.
And I don't see what this has to do with the financial crisis at all.
The Risk of a Meltdown In the Cloud - March 20, 2012
Efficiency normally comes with economies of scale. As a partner in an outsourced vertical software company, we have hundreds of clients running in our highly tuned hosting cluster, and are able to bring economies of scale to an otherwise ridiculously expensive software niche. Yes, that means that if we have an outage, all of our clients experience an outage as well.
However, we have carefully laid plans for multiple recovery points in a disaster scenario, (Plan B, Plan C, Plan D, etc) and have maintained an uptime significantly better than our clients would typically attain if left to their own devices. We easily manage close to 4 nines of uptime in an industry where the average is realistically around 2 nines. (having "the computer is down" a day or two every year or so is typical)
Although the Internet is a "network of ends" the truth is that not all ends are created equal. Having a high quality, high speed (100 Mb), reliable (99.99%+) Internet feed in my small-ish hometown of around 80,000 people is ridiculously expensive. But in a nearby city (500,000 people 2 hours' drive) we host our servers in a tier 1 colo at 1/10th the cost of running it all ourselves, with dramatically improved reliability and network performance.
Yes, putting all your eggs in one basket means that if that basket fails, you lose all your eggs. But it also makes it easy to buy just one, really nice basket that won't break and lose your eggs.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Never turns on its makers. Never. This story is bullshit. Technology is a tool. I treat it like a tool. I control it.
Now, who's up for another drink?
Sounds to me that you have a mushroom cloud.
systems needs to be compartmentalized or have redundancies built into them.
For example, I have several systems that send automated emails. I've had a problem in the past of given email servers not accepting or sending messages. It's uncommon but it happens and it's not acceptable. These are mission critical systems. They can't fail.
Solution? Redundancy up the wazoo. The way it's set up now so many things would all have to happen at the exact same moment that the only way the system is likely to fail is if we fight world war 3... and lose.
That is how you solve this problem. Don't rely on any one system. Rely on all of them. Once you figure out how to integrate one of them it's typically easier to integrate the rest. The virtues of this approach are manifest. Not just stability but if the services do processing or data retrieval you can cross reference them to find errors in databases or get a more complete data set then exists in any one source.
I mean is google or bing the best search engine? What about both at the same time?
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
I think it is funny that lessons learned years ago with mainframes are being presented as new by just changing the word mainframe to cloud.
Unmanaged systems are hard to manage.
Cloud computing is like fractional reserve accounting, with artificially low interest rates?
Sounds like a hair salon.
It's a leap year, February 28, and all over the world, completely out of the blue (or azure if you prefer) cloud clusters crash as the local clocks swing around to midnight, then stay down all day. :)
Still, it's three nines of uptime when it's spread out over a few years
A highly interdependant system is only as reliable as the QC on the weakest link. Who would have thought that somebody from a company that had a lot of embarrassing press about a leap year stuffup would make such a stupid and obvious mistake four years later? That's the cloud, where even the biggest names still don't care anywhere near as much as you would about your own systems and so don't pay enough attention to detail.
Jargon, jargon, jargon, jargon. Jargon.
The difference being, of course, that the global financial crisis was the product of the abyssal greed of speculators and the stupidity of venal governments borrowing from private banks instead of doing the right thing and being directly responsible for the creation of money.
But other than that, sure it's just like it.
(/snark)
Shoes for Industry. Shoes for the Dead.
Using a public cloud seems sensible for low risk projects, or one off, large scale computations. The security and availability risk would suggest that anyone using the cloud for their entire infrastructure has either read too many brochures, or is about to do something else crazy, like divest their entire original business, and then hike service charges.
I was a sysadmin at Octel Communications back in the day. Octel invented voice mail; perhaps you've heard of it.
When I hired on we had three Sun 3/280 servers. I think these were 60830 boxen, but they might have been '020s. They were primarily used for cross-compiling the homebrew RTOS that Octels voice mail machines ran, but they were also used for Electronic Design Automation.
There was a mysterious problem that from time to time would cause one of the servers to go to its knees for an hour or two, but not actually crash. Because all three machines were NFS hard-mounted on each other, as soon as one machine got stuck, they were all stuck. 250 engineers all got to sit on their hands while I contemplated whether I'd be a few inches short of a head by the end of the workday.
I asked a colleague why we didn't soft-mount the NFS shares. That would allow a client of a hung server to timeout. My colleague's reply was that, at the time at least, we couldn't count on our development tools to do the right thing if they got read or write errors during a build. It was felt that soft-mounting might lead to bad machine code generation.
In the end it turned out that the hung servers was caused by high capacitance serial cables. When a machine would emit "SunOS Login:", it would receive a capacitively-couple bunch of garbage back, that login would take as the username. Login would then prompt "Password:", and receive again garbage for the password "attempt". Each machine had 32 serial lines, some of them going hundreds of feet. Good thing I studied Physics and not Computer Science!
The solution was to buy a big, long, expensive spool of serial cable that had lower capacitance per foot, as well as a bunch of RS-232 plug kits, and then to tear out and replace all the cable. That took some convincing to get the management to give me the budget and the time to do the work, but in the end all I required to convince my manager Karen Coates was to hook a glass TTY up to a scope.
In Other News: I have been doing some study of security, and will have results to announce soon. These results will be digitally signed. Please use a keyserver to download my Public Key into your keyring. Please use nothing other than my key fingerprint; key emails and Key IDs can be spoofed:
Researcher Observes Cloud Interactions, Predicts Lightning
Seriously. I don't get why this same description doesn't apply to the internet itself, a thing known to work reliably?
Don't MAKE me RTFA.
Host all the debt on the cloud, then pfft, gone!
As long as you are focusing on infrastructure, or dealing with IaaS providers, you will be stuck thinking of all of the typical IT failure scenarios (systems, not people) but at a much larger scale. The future of cloud computing lies in two areas. Platform as a Service (PaaS), and changing how we write software (in that order).
I don't work for Microsoft, I am talking about Azure specifically because this is our first implementation, but we plan on using other cloud providers as they mature to catch up with Azure.
PaaS. http://en.wikipedia.org/wiki/Platform_as_a_service
Systems like Azure and to a much smaller extent AWS although nobody uses it that way, are abstracted away from the 'myapp==this host' thinking and more towards treating the cloud as if it is an OS overlaid on top of a very large compute fabric. In our deployments we have started re-writing all of our critical functions as worker roles within Azure. The worker roles are dispatched using cloud native functions. We have roles for SQL, processing, BLOB (data) stores, etc. We have some fairly generic communication libraries we use to get them to work together along with the native azure functionality. We have several backend management instances which act as coordinating hubs to deploy, monitor, and manage, all of the worker roles. This allows us to do several things, one, is that any bottleneck can typically be isolated at a much finer level than you would typically get running a monolithic application stack. This allows us to duplicate roles that are getting overworked. This in turn gives us much finer control over scale as we can run multiple roles on the same system, on different systems, whatever makes sense. For purposes of backup we have a very small, almost idle mirror setup in each of the different Azure data centers with only the database being actively migrated (synced). If one data center were to go down, we could basically pick up in another data center and 'right size' the entire thing in a matter of minutes (at worst). All of this is routed to the users through two different CDNs. So there is no direct client to process connection.
Anyhow, that is the route we are taking. Yes, it was a bit of an undertaking to get going but we have been doing it piece by piece with a long ways to go but we are very satisfied with what we have achieved so far.
This is nothing compared to the harm that will be done when government confiscate cloud servers in the name of gathering terrorist information.
You mean like hotmail, ebay, amazon, salesforce or.... Apple? (icloud runs on azure).
the 'global financial crisis' was caused directly by massive fraud and profiteering. is there any incentive for cloud companies to create massive quantities of products that are completely worthless and sell them to sucker investors?
This is a perfect description of what will be "the perfect storm"(cloudy pun intended). And when, not if, it happens there will be a massive exodus form the cloud. The question is, where will the exodus go? Will they bring their data centers back in house? Will they colo and build their own private clouds?
As soon as I can figure out where they will go, I'll be putting my money there.
Nothing, it just another attack by bad analogy man. The global financial meltdown was cause by a small number of financial houses who bet against their own CDO debt bubble and then shorted the entire economy. The same people who are currently going through Europe and bankrupting whole countries one-by-one.
Just look at the periodic reddit meltdowns.
http://michaelsmith.id.au
Complexity is rising in all things at a frightening rate, not just technology. Over my lifetime the amount of information required to make any decision has become massive. For instance, can your select the "best" cellphone for you today? Which credit card? Car? Checking account? There is a coming "complexity collapse." What it will look like, or what the consequences will be is hard to project, but there cannot be an infinite rise in complexity in our lives without something painful happening eventually. Will people retreat from complexity? Will they just start to chuck technology and pull back from activities we now take as normal? Put their money in a mattress at home and use tin cans to communicate? Probably not. But what will they do to protect their sanity when bombarded by too many unmakeable decisions?
E Proelio Veritas.
...it would be a "storm" in the cloud
Would be nice to read the paper rather than some nearly meaningless story about it.