Managing Linux and Virtual Machines?
deijmaster asks: "For a couple of months we have been hearing (as a major consulting firm) IBM people pushing the possibility of installing a Z/Linux VM setup at one of our biggest clients (financial). To a Linux user such as myself this sounds great, at first. Now, I am a bit reluctant when it comes to managing this kind of infrastructure, with little or no local expertise at IBM. Has anyone gone through a Z/Linux VM corporate installation and lived through the management of such a solution?"
If you have never touched VM, then you will be well and truely out of your depth. It's a whole different world to Unix/Linux.
So you will have to get a VM person in. Probably only on part time contract, and IBM will can provide that person for an additional fee.
In time you may learn enough to support your very limited VM environment.
ln -s
I helped a admin friend (pure Novell guy that was somehow tasked with this job) implement TurboLinux on a IBM Z series mainframe. It is kind of easy to work, but you lose some performance, and updates and fixes can be hard to track down sometimes. Clustered Linux solutions could end being cheaper at first, but their TCO may rise higher as time goes on (especially if your company/institution lacks a very competent Linux cluster admin/programmer).
I'd like to see the hardware that this supposed "pure linux" solution would run on. Something piddly and crappy like a dual xeon setup?
Wintel hardware is crap and not at all scalable. It's like comparing a ferrari (z hardware) to a pinto (wintel) and saying "well, they're both cars". Sure the ferrari costs more, but it's a hell of a lot more likely to be able to win in a race.
You wear your underwear on the outside?
Have you ever dealt with a cluster? Large clusters are fucking expensive to run 24x7x365. They require a lot of Air Conditioning (we spend over $1,000 a month on just AC, that's an expense that is never going away), electrical and a shitload of space.
I know this is Slashdot, but a beowulf is not always the best choice!!!
Can I get an eye poke?
Dog House Forum
Disk IO, reliability, workload management and power consumption are also probably relevant in that equation (and on the side of z/linux)
Linux/390 is great for experimental servers, test systems, etc. OTOH - if you have any significant workload, buy a rack-mount PC.
Exactly. I find it interesting when people comment out of the space of speculation. The original question was for someone with "experience". That doesn't mean that he wanted uninformed opinions based on some notion of logic. If someone hasn't sailed the boat, don't tell me how to do it.
That depends on your definition of speed.
Mainframes aren't bought for raw MIPS.
Mea navis aericumbens anguillis abundat
Wintel hardware is crap and not at all scalable. It's like comparing a ferrari (z hardware) to a pinto (wintel) and saying "well, they're both cars". Sure the ferrari costs more, but it's a hell of a lot more likely to be able to win in a race.
Reasoning by analogy is always fraught with pitfalls.
The Ferrari can't carry more than two people. The IBM machine is designed for fast I/O. The Ferrari breaks down a lot. The IBM is designed to be highly reliable.
Perhaps a better, but still rather imperfect analogy would be to a tractor trailer--lots of horsepower, but not a speed daemon. Lots of cargo space. A decent diesel engine that can stand up to abuse.
IBM thinks that if you replace 20-30 Intel CPUs , all running at 5% utilization, with a single zSeries CPU running at 85-90% utilization, you'll save money and aggravation. On the other hand, if those 20-30 Intel CPUs are rendering CGI for a film, or modeling a jet engine (and thus running near 100% load), a zSeries CPU would only be able to take on the work of 4-5 Intel CPUs, if that.
Ahh yes, grasshopper, but when that one uber-box dies(hard disk, fan, power supply, whatever), gets powered off by accident, network cable unplugged, yadda yadda- it affects ALL the virtual machines.
Granted in the Big Iron, you've got lovely hot-swap capabilities and such(processors, memory, etc)...but nothing is foolproof or 100% reliable. It's the old joke with pilots about twin-engine airplanes; the door swings both ways and there's no such thing as a free lunch. On one hand, you've got a spare engine if one dies, but you're 2x as likely to have a failure, you've got a lot of added complexity, and sometimes it still won't save your bacon(twin engine planes have an abysmal survival rate for engine failure in part because of the really shitty way they fly with one engine down). This is VERY applicable- because managing this big IBM server is much more complex(the whole point of this article) than seperate hardware.
Best example I can think of in how hot-swap can still not save the bacon is with the Cisco PIX 5-something(The 1U pizza-box one). It has FULL failover- if you've got two, and one shits the bed COMPLETELY, the other one takes over absolutely everything, including active connections; they share ALL state information for what's called stateful failover. Aside from a momentary blip where things stop for a sec...nobody's the wiser that a piece of very expensive hardware just let the Magic Smoke out. The problem is that the PIX OS version we had was buggy and would crash randomly- and because they were sharing connection tables and everything, they'd BOTH die, which was REALLY bad since the boxes didn't have hardware watchdogs(!). We turned off fully-stateful failover, and the problem went away; we'd notice they'd ping-ponged(there's an 'ACTIVE' led to show you which is live) and we'd power-cycle the other.
So ask the tough questions; instead of asking what's N+1, ask what's NOT N+1, and do a very careful breakdown of what exactly it will cost to run this big huge box, and figure out what the 'per [virtual] machine' costs are...
Please help metamoderate.
Management costs for dedicated servers which are almost idle, but still required as dedicated servers for many reasons are high. Also, reliability is an issue when you suddenly multiply low cost servers, which in turn reflects on the management costs, hardware cost and downtime cost.
Achille Talon
Hop!
Unless you're in for the mainframe class hardware (and possibly support).
Coz for x86 servers, you can always use vmware e.g. vmware esx.
Not sure if vmware has anything lined up for opteron, but if that goes fine then it'll be cool.
They're great number crunchers, but they don't hold up under any kind of pressure as a web server. We had the z-series with no sites on it run benchmarks and compare to our development box with 20 sites hosted, and the development box (Penguin Computing) kicked its ASS.
You clearly have no idea what you're talking about. Great number crunchers? I can't even imagine what your testing was.
With VM you can have all 100 instances of linux share the same system disks read only, install code on one, then each can pick up the updated code with a /etc/init.d/blah restart command.
And - that restart command can be issued from a VM service machine (PROP - the programmable operator) whose sole function is to issue commands to all the Linux machines and make sure they do it.
So basically it's rpm -Fvh foo.rpm on the master disk image, followed by a RESTART FOO message to PROP and you're done.
(Note - I'm not Adam - but I can vouch that he does know what he's talking about and this is my guess at what he'd say)
Yes, mainframes do go down, but it's usually due to some edge case that testing didn't catch. A production system going down (an "outage") usually causes IBM field engineers to hop on the nearest plane to the customer site.
IBM Mainframes have the advantages of a very old and robust operating system, reliable and redundant hardware, and a thorough testing process before they are shipped out the door. This is what makes them more reliable.
"I disapprove of what you say, but I will defend to the death your right to say it."
- Evelyn Beatrice Hall
The Z-series supports taking CPUs out of comission for replacement without downtime. Same for RAM. Multiple hot-swappable SCSI controllers connected to a fully redundant storage system such as the ESS/Shark (where you can connect to two separate banks of controllers, so that any one of them can be offline without causing problems, and which has two separate AIX servers handling requests, supports RAID and synchroneous mirroring over fiber to a backup ESS), multiple hot swappable network cards, multiple power supplies, and you start getting pretty safe.
Yes, it will cost money, but so will providing all of the above for standalone servers. The Z-series is marketed primarily as a way of reducing maintenance work by consolidating your "servers" on one or two physical platforms, not for it's purchase price - it's an expensive beast.
... using VM. Not everything can be measured in pure dollars and cents. Consider: All the stuff written about "what-if" this or that fails because I have only one box can largely be ignored. All that fail-over stuff is built under the skin of the box. Just because you don't see it as multiple distinct boxes doesn't mean it's not under the covers (multiple power supplies, cpu's, busses, etc.). When something goes wrong in an app you can right off generally cross-off hardware problems. That's because, if there are hardware faults, the system brings in spares and shoots out diagnostics on EXACTLY what's wrong, right down to the card level. So if the sys is quiet about the hardware, it isn't the hardware. One very big advantage is being able to run multiple versions of your OS's simultaneously. That means you don't have to worry about the crusty app running on the dusty box nobody remembers anything about. It's all on your M/F and will move right over if you change hardware. And, of course, business recovery is a dream since your not talking about replicating all those unique boxes you've accumulated over the years. In general, VM should be looked at as a management tool more than pure power under the hood. If you need to manage your corporate computing needs at a corporate, strategic level, VM's for you. But that doesn't mean there won't be a few instances where you've got to have the pure dedicated power for one app. But as the years go by and some apps hang around and must be maintained while focus moves onto other things, you will be very happy you've got VM there to manage your own sanity.
Internet suspend/resume at Intel Research in pittsburgh is another: paper HERE. They also had an article in scientific america awhile back.
One big advantage of managing with VM's is a complete system is just like a file, and thus can be copied and migrated easily. For example, if you have a production server with some faulty hardware, you can migrate the machine to a new host by simply copying the VM files, then repair the hardware, and copy it back.
Of course the efficiency is degraded somewhat do to the VM overhead, but the main argument is cycles are cheap, peopel are expensive. It's cheaper to by a P4 2.4 GHZ for $500 than buy a new sysadmin for $60,000. If you are performance-limited, just replicate instead of buying some fancy hardware (or look into better VM technology like VMware ESX server).