Researcher: Interdependencies Could Lead To Cloud 'Meltdowns'
alphadogg writes "As the use of cloud computing becomes more and more mainstream, serious operational 'meltdowns' could arise as end-users and vendors mix, match and bundle services for various means, a researcher argues in a new paper set for discussion next week at the USENIX HotCloud '12 conference in Boston. 'As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities or "meltdowns,"' Yale University researcher and assistant computer science professor Bryan Ford wrote in the paper. Ford compared this scenario to the intertwining, complex relationships and structures that helped contribute to the global financial crisis."
If you have a critical service, have it at more than one host... That way when AWS has a bad hair day, you are still up.
Or, have your entire business totally dependent one someone else. (Sounds kinda scary that way, don't it?)
XKCD (jokingly) saw this coming a while ago: http://xkcd.com/908/
we live in an age where information is distributed, even if statistical. (hell I made a fake Facebook account and somehow they found my mom, and she is no where close to me) a meltdown of information can't happen unless there is a world wide melt down of power. we have backups, but also ways of statistically restoring those backups.
The analogy the author uses doesn't work.
A better analogy would be the airline industry. The airline industry likes to over-book airplane seats it may not have because it's always trying to optimize its profit-margin.
The same will happen with cloud-services. Cloud-services will always try to optimize their own profit-margins, at the risk of triggering significant outages.
And I don't see what this has to do with the financial crisis at all.
The Risk of a Meltdown In the Cloud - March 20, 2012
Efficiency normally comes with economies of scale. As a partner in an outsourced vertical software company, we have hundreds of clients running in our highly tuned hosting cluster, and are able to bring economies of scale to an otherwise ridiculously expensive software niche. Yes, that means that if we have an outage, all of our clients experience an outage as well.
However, we have carefully laid plans for multiple recovery points in a disaster scenario, (Plan B, Plan C, Plan D, etc) and have maintained an uptime significantly better than our clients would typically attain if left to their own devices. We easily manage close to 4 nines of uptime in an industry where the average is realistically around 2 nines. (having "the computer is down" a day or two every year or so is typical)
Although the Internet is a "network of ends" the truth is that not all ends are created equal. Having a high quality, high speed (100 Mb), reliable (99.99%+) Internet feed in my small-ish hometown of around 80,000 people is ridiculously expensive. But in a nearby city (500,000 people 2 hours' drive) we host our servers in a tier 1 colo at 1/10th the cost of running it all ourselves, with dramatically improved reliability and network performance.
Yes, putting all your eggs in one basket means that if that basket fails, you lose all your eggs. But it also makes it easy to buy just one, really nice basket that won't break and lose your eggs.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Never turns on its makers. Never. This story is bullshit. Technology is a tool. I treat it like a tool. I control it.
Now, who's up for another drink?
Sounds to me that you have a mushroom cloud.
systems needs to be compartmentalized or have redundancies built into them.
For example, I have several systems that send automated emails. I've had a problem in the past of given email servers not accepting or sending messages. It's uncommon but it happens and it's not acceptable. These are mission critical systems. They can't fail.
Solution? Redundancy up the wazoo. The way it's set up now so many things would all have to happen at the exact same moment that the only way the system is likely to fail is if we fight world war 3... and lose.
That is how you solve this problem. Don't rely on any one system. Rely on all of them. Once you figure out how to integrate one of them it's typically easier to integrate the rest. The virtues of this approach are manifest. Not just stability but if the services do processing or data retrieval you can cross reference them to find errors in databases or get a more complete data set then exists in any one source.
I mean is google or bing the best search engine? What about both at the same time?
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
I think it is funny that lessons learned years ago with mainframes are being presented as new by just changing the word mainframe to cloud.
Unmanaged systems are hard to manage.
Cloud computing is like fractional reserve accounting, with artificially low interest rates?
Sounds like a hair salon.
It's a leap year, February 28, and all over the world, completely out of the blue (or azure if you prefer) cloud clusters crash as the local clocks swing around to midnight, then stay down all day. :)
Still, it's three nines of uptime when it's spread out over a few years
A highly interdependant system is only as reliable as the QC on the weakest link. Who would have thought that somebody from a company that had a lot of embarrassing press about a leap year stuffup would make such a stupid and obvious mistake four years later? That's the cloud, where even the biggest names still don't care anywhere near as much as you would about your own systems and so don't pay enough attention to detail.
Jargon, jargon, jargon, jargon. Jargon.
The difference being, of course, that the global financial crisis was the product of the abyssal greed of speculators and the stupidity of venal governments borrowing from private banks instead of doing the right thing and being directly responsible for the creation of money.
But other than that, sure it's just like it.
(/snark)
Shoes for Industry. Shoes for the Dead.
Using a public cloud seems sensible for low risk projects, or one off, large scale computations. The security and availability risk would suggest that anyone using the cloud for their entire infrastructure has either read too many brochures, or is about to do something else crazy, like divest their entire original business, and then hike service charges.
I was a sysadmin at Octel Communications back in the day. Octel invented voice mail; perhaps you've heard of it.
When I hired on we had three Sun 3/280 servers. I think these were 60830 boxen, but they might have been '020s. They were primarily used for cross-compiling the homebrew RTOS that Octels voice mail machines ran, but they were also used for Electronic Design Automation.
There was a mysterious problem that from time to time would cause one of the servers to go to its knees for an hour or two, but not actually crash. Because all three machines were NFS hard-mounted on each other, as soon as one machine got stuck, they were all stuck. 250 engineers all got to sit on their hands while I contemplated whether I'd be a few inches short of a head by the end of the workday.
I asked a colleague why we didn't soft-mount the NFS shares. That would allow a client of a hung server to timeout. My colleague's reply was that, at the time at least, we couldn't count on our development tools to do the right thing if they got read or write errors during a build. It was felt that soft-mounting might lead to bad machine code generation.
In the end it turned out that the hung servers was caused by high capacitance serial cables. When a machine would emit "SunOS Login:", it would receive a capacitively-couple bunch of garbage back, that login would take as the username. Login would then prompt "Password:", and receive again garbage for the password "attempt". Each machine had 32 serial lines, some of them going hundreds of feet. Good thing I studied Physics and not Computer Science!
The solution was to buy a big, long, expensive spool of serial cable that had lower capacitance per foot, as well as a bunch of RS-232 plug kits, and then to tear out and replace all the cable. That took some convincing to get the management to give me the budget and the time to do the work, but in the end all I required to convince my manager Karen Coates was to hook a glass TTY up to a scope.
In Other News: I have been doing some study of security, and will have results to announce soon. These results will be digitally signed. Please use a keyserver to download my Public Key into your keyring. Please use nothing other than my key fingerprint; key emails and Key IDs can be spoofed:
Researcher Observes Cloud Interactions, Predicts Lightning
Seriously. I don't get why this same description doesn't apply to the internet itself, a thing known to work reliably?
Don't MAKE me RTFA.
Host all the debt on the cloud, then pfft, gone!
As long as you are focusing on infrastructure, or dealing with IaaS providers, you will be stuck thinking of all of the typical IT failure scenarios (systems, not people) but at a much larger scale. The future of cloud computing lies in two areas. Platform as a Service (PaaS), and changing how we write software (in that order).
I don't work for Microsoft, I am talking about Azure specifically because this is our first implementation, but we plan on using other cloud providers as they mature to catch up with Azure.
PaaS. http://en.wikipedia.org/wiki/Platform_as_a_service
Systems like Azure and to a much smaller extent AWS although nobody uses it that way, are abstracted away from the 'myapp==this host' thinking and more towards treating the cloud as if it is an OS overlaid on top of a very large compute fabric. In our deployments we have started re-writing all of our critical functions as worker roles within Azure. The worker roles are dispatched using cloud native functions. We have roles for SQL, processing, BLOB (data) stores, etc. We have some fairly generic communication libraries we use to get them to work together along with the native azure functionality. We have several backend management instances which act as coordinating hubs to deploy, monitor, and manage, all of the worker roles. This allows us to do several things, one, is that any bottleneck can typically be isolated at a much finer level than you would typically get running a monolithic application stack. This allows us to duplicate roles that are getting overworked. This in turn gives us much finer control over scale as we can run multiple roles on the same system, on different systems, whatever makes sense. For purposes of backup we have a very small, almost idle mirror setup in each of the different Azure data centers with only the database being actively migrated (synced). If one data center were to go down, we could basically pick up in another data center and 'right size' the entire thing in a matter of minutes (at worst). All of this is routed to the users through two different CDNs. So there is no direct client to process connection.
Anyhow, that is the route we are taking. Yes, it was a bit of an undertaking to get going but we have been doing it piece by piece with a long ways to go but we are very satisfied with what we have achieved so far.
This is nothing compared to the harm that will be done when government confiscate cloud servers in the name of gathering terrorist information.
You mean like hotmail, ebay, amazon, salesforce or.... Apple? (icloud runs on azure).
the 'global financial crisis' was caused directly by massive fraud and profiteering. is there any incentive for cloud companies to create massive quantities of products that are completely worthless and sell them to sucker investors?
This is a perfect description of what will be "the perfect storm"(cloudy pun intended). And when, not if, it happens there will be a massive exodus form the cloud. The question is, where will the exodus go? Will they bring their data centers back in house? Will they colo and build their own private clouds?
As soon as I can figure out where they will go, I'll be putting my money there.
Nothing, it just another attack by bad analogy man. The global financial meltdown was cause by a small number of financial houses who bet against their own CDO debt bubble and then shorted the entire economy. The same people who are currently going through Europe and bankrupting whole countries one-by-one.
Just look at the periodic reddit meltdowns.
http://michaelsmith.id.au
Complexity is rising in all things at a frightening rate, not just technology. Over my lifetime the amount of information required to make any decision has become massive. For instance, can your select the "best" cellphone for you today? Which credit card? Car? Checking account? There is a coming "complexity collapse." What it will look like, or what the consequences will be is hard to project, but there cannot be an infinite rise in complexity in our lives without something painful happening eventually. Will people retreat from complexity? Will they just start to chuck technology and pull back from activities we now take as normal? Put their money in a mattress at home and use tin cans to communicate? Probably not. But what will they do to protect their sanity when bombarded by too many unmakeable decisions?
E Proelio Veritas.
...it would be a "storm" in the cloud
Would be nice to read the paper rather than some nearly meaningless story about it.