Managing Linux and Virtual Machines?
deijmaster asks: "For a couple of months we have been hearing (as a major consulting firm) IBM people pushing the possibility of installing a Z/Linux VM setup at one of our biggest clients (financial). To a Linux user such as myself this sounds great, at first. Now, I am a bit reluctant when it comes to managing this kind of infrastructure, with little or no local expertise at IBM. Has anyone gone through a Z/Linux VM corporate installation and lived through the management of such a solution?"
Wintel hardware is crap and not at all scalable. It's like comparing a ferrari (z hardware) to a pinto (wintel) and saying "well, they're both cars". Sure the ferrari costs more, but it's a hell of a lot more likely to be able to win in a race.
Reasoning by analogy is always fraught with pitfalls.
The Ferrari can't carry more than two people. The IBM machine is designed for fast I/O. The Ferrari breaks down a lot. The IBM is designed to be highly reliable.
Perhaps a better, but still rather imperfect analogy would be to a tractor trailer--lots of horsepower, but not a speed daemon. Lots of cargo space. A decent diesel engine that can stand up to abuse.
IBM thinks that if you replace 20-30 Intel CPUs , all running at 5% utilization, with a single zSeries CPU running at 85-90% utilization, you'll save money and aggravation. On the other hand, if those 20-30 Intel CPUs are rendering CGI for a film, or modeling a jet engine (and thus running near 100% load), a zSeries CPU would only be able to take on the work of 4-5 Intel CPUs, if that.
Ahh yes, grasshopper, but when that one uber-box dies(hard disk, fan, power supply, whatever), gets powered off by accident, network cable unplugged, yadda yadda- it affects ALL the virtual machines.
Granted in the Big Iron, you've got lovely hot-swap capabilities and such(processors, memory, etc)...but nothing is foolproof or 100% reliable. It's the old joke with pilots about twin-engine airplanes; the door swings both ways and there's no such thing as a free lunch. On one hand, you've got a spare engine if one dies, but you're 2x as likely to have a failure, you've got a lot of added complexity, and sometimes it still won't save your bacon(twin engine planes have an abysmal survival rate for engine failure in part because of the really shitty way they fly with one engine down). This is VERY applicable- because managing this big IBM server is much more complex(the whole point of this article) than seperate hardware.
Best example I can think of in how hot-swap can still not save the bacon is with the Cisco PIX 5-something(The 1U pizza-box one). It has FULL failover- if you've got two, and one shits the bed COMPLETELY, the other one takes over absolutely everything, including active connections; they share ALL state information for what's called stateful failover. Aside from a momentary blip where things stop for a sec...nobody's the wiser that a piece of very expensive hardware just let the Magic Smoke out. The problem is that the PIX OS version we had was buggy and would crash randomly- and because they were sharing connection tables and everything, they'd BOTH die, which was REALLY bad since the boxes didn't have hardware watchdogs(!). We turned off fully-stateful failover, and the problem went away; we'd notice they'd ping-ponged(there's an 'ACTIVE' led to show you which is live) and we'd power-cycle the other.
So ask the tough questions; instead of asking what's N+1, ask what's NOT N+1, and do a very careful breakdown of what exactly it will cost to run this big huge box, and figure out what the 'per [virtual] machine' costs are...
Please help metamoderate.