Making Best Use of Data Center Space: Density Vs. Isolation
jfruh writes The ability to cram multiple virtual servers on a single physical computer is tempting — so tempting that many shops overlook the downsides of having so many important systems subject to a single point of physical failure. But how can you isolate your servers physically but still take up less room? Matthew Mobrea takes a look at the options, including new server platforms that offer what he calls "dense isolation."
Heck, 13 years ago at a Canadian federal government job we swapped our web servers for blades.
Which was pretty bleeding-edge at the time, since the first blade server was 2001. So not sure what your point about the government is - they weren't late to the party, far from it.
He should consider using virtualization to increase his uptime since he is worried about multiple important systems on a single server. Virtualization gives you such good yields in consolidation, you can come out ahead while still using redundancy features like VMware FaultTolerance. Your vm runs "in-step" on two hosts, and will survive even if either host fails. Just requires 2X the used memory. That's still only the most extreme case though like for databases, as most servers should be able to survive a reboot (which is what happens when your host dies and there is capacity left in your cluster. The VM powers back up on another host.
I'll accept the idea that somewhere somebody has so many servers and so little space that a blade center was the only way they could achieve the density they needed.
Except I've never seen it -- all the blade centers I've ever seen have been partially full and the equivilent 1U and 2U servers probably would have fit in the same or less space than the blade chasis was occupying.
And almost always there's a mongolian clusterfuck when they decide to add blades to the chasis -- which they inevitably do, because they have so much money sunk into the blades that there's no way out from under it.
The mongolian clusterfuck is the result of the byzantine cofiguration rules each vendor has for determining a blade's NIC or FC mapping with the blade center's (overpriced) internal switch bays. Half or full height? LoM or mezzanine slot? Which mez slot? Which blade slot? Oh, you want an extra NIC on that blade? Sorry, the mapping requires an additional switching module which will cost you more than any decent L3 48 port gig switch.
Whatever the savings from the blade center (and maybe in some metered situation there is power savings of couple hundred watts) is easily lost in hours of troubleshooting when trying to do something different.
Blade centers always look like some kind of pre-virtualization version of server consolidation that became obsolete once 24U of servers could easily be run on 8U or less of VM host and SAN. They would be a lot more interesting if their mapping regeimes weren't hard wired -- blade advocates give me blahblah "point of failure" about a switchable/configurable backplane.
I have just put in a Blade / VM configuration at a school (don't ask what they were running before, you don't want to know).
Our DR plan is that we have an identical rack at another location with blades / storage / VM's / etc. on hot-standby
Our DDR (double-disaster recovery!) plan is to restore the VM's we have to somewhere else, e.g. cloud provider, if something prevents us operating on that plan.
The worries I have are that storage is integrated into the blade server (a SPOF on its own, but at least we have multiple blade servers mirroring that data), and that we are relying on a single network to join them.
The DDR plan is literally there for "we can't get on site" scenarios, and involves spinning up copies of instances on an entirely separate network, including external numbering. It's not a big deal for us, we are merely a tiny school, but if even we're thinking of that and seeing those SPOF's, you'd think someone writing their article into Slashdot would see that too.
All the hardware in the world is useless if that fibre going into the IT office breaks, or a "single" RAID card falls over (or the RAID even degrades, affecting performance). It seems pretty obvious. Two of everything, minimum. And thus two ways to get to everything, minimum.
If you can't count two or more of everything, then you can't (in theory) safely smash one of anything and continue. Whether that's a blade server, power cord, network switch, wall socket, building generator, or whatever, it's the same. And it's blindingly obvious why that is.