Microsoft Janus
nadador
writes "Apparently, Microsoft is readying an enterprise
class clustering and failover version of Windows 2000.
Techweb, and Microsoft, I'm sure, seem to think this is
going to be a "Unix Killer". It also mentions Linux as a
driving force in making Windows truly enterprise class
software" It actually sounds quite impressive. I can't wait
to see what some of the upcoming HA (high-availability) enhancements for Linux
will look like.
Whether NT is stable in a single server non-HA configuration or not does not matter; as long as the system as viewed from outside the cluster is up all the time with acceptable performance, there is no loss. Linux can do HA too, but the apps just aren't there. We can't beat this because we don't have control over it. Stability is really the only thing Linux has over NT at the moment in the data center, but this turns the tables. NT with failover clusters is more reliable than any single Linux machine.
Have Oracle port OPS. Oh, wait, that won't be done until raw devices are in the kernel, and Linus doesn't like them. Same for other cluster-enabled RDBMSs. Linux also has a severe filesystem deficiency right now, but as I understand it, this is being worked on, but I don't see much real progress. Other scalability concerns are being addressed in 2.3 right now, which should be out before 2000 as 2.4, if I am to understand Linus's release schedule correctly.
Another real problem with Linux is the lack of availablity of midrange and high-end hardware to key developers. My company (Denarius: http://www.denarius.com) would be more than happy to supply and set up access to high-end hardware for kernel developers as a service to the community. Hardware manufacturers would have an incentive to offer evaluations of their hardware to "sponsor" the project, as well, gaining bonus points with developers and users.
Here I prooves that Micorsoft is really really very very good: http://www.freeyellow.com/members7 /geraldholmes/index.html
I was one of the principal designers and implementors of both the cluster manager and lock manager for HACMP/6000 version 3.1 (which, BTW, supported 8-way symmetric failover) back in '94, so maybe I'm qualified to comment on some of this.
;-) MS has so far exhibited nothing but the most startling ignorance and incompetence in these areas so far, and the idea that they'll suddenly leapfrog the established experts like this is just bunk. It's far easier to believe that they're deliberately making false claims to scare off the competition...again.
First, about MS. The consensus opinion among people who really know HA is that Wolfpack was and is the most pathetic piece of junk ever. The prevailing theory is that they quite deliberately announced it knowing that it was junk, to scare off anyone (such as my employer at the time) who might try to produce their own NT HA solutions.
This Janus project is just another step in that direction. 64 or 256 nodes? Yeah right. There are several reasons other HA solutions typically only go up to eight. The main one is that nobody really wants a single cluster that big. It's a total management nightmare. What customers actually want to do is set up multiple independent clusters of a reasonable size, and perhaps manage them all from within a common framework, but that's not the same as a single cluster. There's just no benefit to offset the cost of setting up failover relationships that deep and complex.
Another reason you don't see HA clusters beyond eight is that it's all but impossible to devise protocols (membership, hearbeat, consensus, and so on) that scale that high and yet still handle the simple cases efficiently. Just avoiding all the race conditions in eight nodes booting and trying to join the cluster at once is incredibly difficult. If you don't think it's that hard, try it. Have fun. Come back after you've failed, and we'll talk.
Now that I've bashed MS HA, a few words about Linux HA. It's as pathetic as MS. We have some very basic heartbeat code, and a few other scattered bits and pieces, but that's it. There's practically no fault identification to distinguish different types of failures so that one can respond differently to an adapter or network failure as distinct from a node failure. There's no lock manager. Many of the people working on the designs are only beginning to grasp the basic problems, and they're months if not years from actually implementing industrial-strength solutions. I'm on the mailing list (or I was, before I moved and had to give up my cable-modem account), I see the traffic, and it's Just Not There. I'm sorry, and I wish I could spare more time to contribute more of my own hard-won experience to the project, but that's just the way things are.
jdarcy@emc.com, until I get a new home account
Slashdot - News for Herds. Stuff that Splatters.
Actually, Linux does have failover capability already. There is a Linux HA project currently in progress. Here's a few quick links that I pulled out of freshmeat:
3 572853.html
Linux-HA:
http://www.henge.com/~alanr/ha/
failoverd:
http://www.freshmeat.net/appindex/1999/04/08/92
Heart:
http://www.lemuria.org/Heart/
æeee!