Clustering vs. Fault-Tolerant Servers
mstansberry writes "According to SearchDataCenter.com fault-tolerant server vendors say the majority of hardware and software makers have pushed clustering as a high-availability option because it sells more hardware and software licenses. Fault-tolerant servers pack redundant components such as power supply and storage into a single box, while clustering involves the networking of multiple, standard servers used as failover machines." Perhaps some readers on the front lines can shed a bit more light on the debate based on both proprietary and Linux-based approaches.
Personally I opt for clustering over fualt-tollerance - but thats my personal choice. It really depends on what the machine(s) will be doing. If you have a database server - fault tollerence (because I have yet to meet a clustering DB solution that didnt suck). But if your building a webserver - cluster.
Also the one thing the article mentions is that clustering is just as expensive as fault-tollerence due to software licesing. Last I checked if its one copy of Debian + Apache + MySQL + Perl or 200 copies - its going to cost me the same price (free). And windows doesnt support clustering yet - in any decent way shape or form - so I dont see the problem here.
snowulf.com
Because of the open source stack behind a lot of server platforms these days, I'm dubious that this decision boils down simply to a software cost issue. One major benefit of using clustering is that many white box, non specialized machines can be used, which are easier & cheapter to replace or obtain components for. Complex and specialized hardware with built in redundancy is often expensive and can require vendor support contracts for effective maintainance.
Business Voyeur
The article seems to make the choice one-sided. Fault tolerant servers have higher uptimes because the backup takes over immediately. Clusters have a single point of failure in the middleware. They argue that the clusters can run different operating systems, but that means more patches and updates to keep track of. Clusters are expensive because they need more OS and software licenses and require a lot of maintenance, though that might drop if they are running Linux or FreeBSD.
Anyone make a case for clusters for high-uptime situations?
A NYC lawyer blogs. http://www.chuangblog.com/
If you buy one machine, you still may need to power it off to open the case, or replace a part.
I build AIX HACMP clusters for a living, and I'll tell you that you should *never* use an either/or approach, as TFA suggests. Nobody in their right mind is wondering if they should get a cluster OR FT hardware. They get a cluster of FT servers.
Maybe if they want to write an article, they should spend some time in the real world and see how the HA industry works instead of making up some arbitrary demarkation line to hang a preconception on.
In my 15 years of IT consulting, no network has provided data safety transparency cheaply or consistently enough. Clusters and fault tolerance both cost more than downtime in my experience.
We desperately need a better way to access data in a corporate network.
My favorite customers are those architects and engineers who avoid networking except for the Net. Seriously, sneakernet and peer-to-peer has shown the least downtime I've seen.
I think p2p networks will see a comeback if a torrent-like protocol can grow to be speedy. My customers are not banks, but they need 100% uptime as every day is a beat-the-deadline day.
If someone can extend and combine an internal torrent system with a decent file cataloging and searching system, they'll see huge money. I have some 150 user CAD networks just waiting for it.
What would a hive network need?
* Serverless
* Files hived to 3+ workstations
* Database object hiving
* File modification ability (save new file in hive, rename previous file as old version, delete really old versions after user configurable changes)
* "Wayback Machine" feature from old versions
* PCs disconnected from hive will self correct upon reconnection
It is very complex right now, but my bet is that the P2P network will trump client-server for the short run. The "client is the server" vs "the server is the client"?
You're still not safe with clustering if you share data. I once worked on a SQL Server cluster with shared disks. SQL Server would crash because a database page contained crap data. The system would then take 10 minutes to fail over to another node. Once it was running, it would read the same page and crap out, causing the other node to come back up. Lather, rinse, repeat.
Google proved that clustering could be fault tolerant, while costing less than true fault-tolerant hardware.
Google built massive clusters of thousands of machines out of very cheap unreliable hardware. They have tons of hardware failure due to the extremely cheap components (and sheer number of machines), but everything is redundant (And fully fault tolerant).
They did this, again, using dirt cheap hardware.
According to a presentation that I recently attended given by Jim Reese, the guy who scaled google from a couple hundred servers to over 300,000, this is still true. It was a very interesting presentation and included discussion about the problems with cramming 80 pc's into a standard server rack... including heat, cable management, machine replacement.. etc.
.3 seconds.
Other interesting tid bits that I remember:
-over 300,000 x86 machines make up the network, with clusters all over the place which make searces return in under
-commodity hardware (maxtor, western digital, whatever is available) is used.
-over a thousand machines fail daily. Most are automatically reboot, and it sounded like admins only come into play when a machine needs to be replaced.
-the longest uptime of a single machine has been 7 years
-they use a heavily modified redhat distro.
-real time stats of the entire network can be seen at any moment
i'm sure there were more interesting facts but that's all I can regurgitate at the moment.
The agency bought a pair of dual proc Dells with lots of RAM and a full software stack (Windows Server, SQLServer, and ColdFusion Server). Total cost: ~$57,000.
That's right, nearly 60k.
Now, I've read that Google buys their white boxes at $1k each for their server farm. And I couldn't help but think what they'd (or I) would do with 57 boxes instead of 2.
But hey, my opinion doesn't matter. I'm not a PHB in a gov't agency. But sure as hell, if I were a business in a competitive environment (and a gov't agency is not), I'd be looking to implement the simple and effective white box solution on the cheap. But that's just me.
Ever seen the actual numbers on Oracle RAC scaling on any decent sized install..let's say 4+? Not pretty. Oracle RAC is an interesting technology, but it serves one purpose and one purpose only. Allow Larry to buy fancier boats.
Why do you think Oracle runs its business (main ERP apps) on 4 nodes of big Sun iron and not 8+ nodes of cheap linux boxes? They tout the shit out of the cheap linux boxes to their customers because if an IT department has a $4 million dollar budget for a big project, Oracle wants to get $3.95 million of that. They only way they can do that is to push the crap out of cheap dell and linux for the hardware and the OS. But when it is *Oracle's* business on the line..well, the proof is in the pudding as they say.