Linux Clustering Cabal project
RayChuang turned us on to
this ZDnet story about the Linux Clustering Cabal project, which, Ray says, is "...the one that will allow Linux server clustering of many server machines. Sounds like just the thing to finally get eBay working reliabily and also make John C. Dvorak eat his words about the deficiencies of Linux."
In a nutshell: Ninja is dealing with several problems which Jini is not addressing. We care a great deal about security, scalability (millions of simultaneous users), fault-tolerance, and deployment of wide-area services -- Jini is more focused on the local area and "workgroup" issues. We hope that there will be a Ninja-Jini bridge so that the two can talk to each other. I will be at the Jini Community Meeting in Annapolis next month to discuss these issues with the Jini folks and get a better handle on them.
Clustering is very differant than what you describe. In most unix enviroments (or sometimes refered to as High Availability) it is failover. Where you have one App running on a server, and a backup server that chacks to see if the App server is running the App. If the Appserver dies then the backup starts the App and assumes the IP of the Appserver. This is just an oversimplified example. VMS uses a simular scheme that is more dynamic that enforces load sharing. Then Systems like seti@home is more of a distributed application for hi-calculation. This system does not do well in a database driven application. Clusters in general make many hosts into one host.
The best place I found were the talk page and products pages off of www.bitmover.com
There isn't a whole lot there right now.
belswick wrote
:-) ).
Note that SGI is showing all the signs of entering the death throes stage. Another 30% of the workforce laid off, abandoning major initiatives, CEO bailing (to MS!!), loss of faith by major customers.
Unless you've got inside information (which the SEC would be very interested in hearing about), I think the slashdot audience would appreciate more evidence than mindless parrotting of popular press. For your information, they are spinning off several portions of their divisions into separate business entities. Now while some people may consider this akin to kicking fledgings out the nest, the rate of turnover in Silcon Valley is such that the difference between working for one company vs another is just which branded T-shirt you wear. Think of it as a beehive with clumps forming and dispersing to form interesting new combinations. Abandoning major initiatives?, how many announcements have you've heard from major companies that have died the silent death of being irrelevant to real needs.
As for the CEO, well, I'm sure there will be some interesting books a few years down the track but for many hard-core SGI purchasers, the shift into Intel consumerism where they did not have any competitive advantaged showed some very wooly thinking (for the cognosti, there is nothing technically inferior about the MIPS architecture). The loss of customers is not surprising considering that many applications that used to be top-end in the 70s can now run on a single modern processor and big cache (the refuge of the lazy microarchitect). Getting a free ride from Moore's Law is not the same as coming up with innovative new software applciations that can really take advantage of increased CPU capacity (apart from molecular simulations which will chew up any CPU cycle you throw at them).
Customers will buy SGI equipment if SGI can show they offer a value proposition that is worth the premium over mainstream machines, whether it is memory latency, quality engineering, coolness factor or whateever, people will buy (oh and getting their manufacturing/distribution process to be more efficient would help a lot). Computers are becoming so prevalent that the only distinguishing feature nowadays for PCs is image and lifestyle (does the color clash with the decor
Reasonable people must expect that SGI goes Chapter 11 RSN (barring a government bailout) and then what happens to people who need supercomputers?
Would you say Apple devotees are unreasonable? Don't you understand that given a planet of 5 billion odd people, not everyone is interested in the toys you are? Cries of doom and gloom have always been around in any industry in one form or another as it gives paper pushers a reason to justify their existance instead of getting their hands dirty coding or designing. You have to realise that SGI serves a fairly specialised market (data intensive, high-end graphics, scientific back-end grunt machines) in the 50K-50M range. Much like Porsche and BMW cater towards a cliental that wants absolute performance and not cheap consumer junk (admitedly the Japanese have given the US auto industry a shot in the arm since 80s), there will always be people who appreciate the qualities that SGI offers. Provided SGI can continue to support those companies and not go around trying to push Porshes for people wanting bicyles (amazing how hype can convince people they need a Pentium III to browse the web) at an affordable price, they will survive.
If you work for a company, you'd realise that the first law is survival which is depedent on their market relevance. SGI will continue so long as their is a demand for their expertise as priced compared with other market alternatives.
LL
Greg Pfister's book is good -- the details are somewhat dated, though the conceptual portion appears to be aging well.
Distributed net has a page with references for other texts on clustering. `Course, you can always check out the related book purchases links at Amazon.
What part of "gestalt" don't you understand?
But clustering is very different from the examples you give. It's not running different services on different machines. It is taking a bunch of machines and making them act as one.
Beowulf-style clusters are one way of doing this, but there's a limit to how many nodes you can connect that way and still get performance increases. It scales up, but probably not to thousands of nodes. Now, the LCC people obviously haven't built anything to prove that they can do better, but it sounds like they may have a theoretical improvement.
And, it's only hinted in the article ("satisifies both commercial data processing and HPC requirements"), but it's possible also that this technology is not only fast, but unlike Beowulf also provides improved robustness.
This is all vapor now of course. But we'll see. The people working on this have some important projects to their credit.
--
There's a bit of info at his homepage and resume. I think you might have found your sophisticated know-how.
What part of "gestalt" don't you understand?
Clustering isn't ground-breaking technology.. it's been around for a long time. Now, the concept of parallel processing has been around for a long time too... and it doesn't seem like many manufacturers are rushing to get their products working on beowulf clusters.
This isn't to say it isn't a great idea - it's just that there isn't any support for it. There's plenty of alternatives too. For example:
Webservers: Set up several servers, and an SQL backend (or an NFS mounted partition) to hold the content. For added speed, throw squid over that setup. You can even tell remote caches to access your servers round-robin style by putting in multiple 'A' records.
DNS/mail: Heh. Even the IETF got this one right by suggesting primary and secondary DNS.
Filesharing: There is some work being done to create a 'real' beowulf cluster to create something of a decentralized logical file server. For now, use AFS or CODA.. which have all kinds of cool performance benefits. As an aside - both are a helluva lot more stable than the Nightmare File System (NFS).
Printing: They have affordable net appliances to do this (HP print server anyone?), and even some printers support direct access. Failing that, setting up multiple servers for multiple printers works pretty well - This is decentralized by design anyway...
So there you have it... all the staples of the corporate network - "clusterized". New technology? I don't think so. All the examples I gave you are in wide use (and have been for some time!).
--
For those who want some background on the important issues, I highly recommend Gregory Pfister's book In Search of Clusters . Clustering is a lot harder than most people realize, and people should not ignore the work that's been done before in this area. The important question for LCC is what is fundamentally new in their design. I doubt that the lack of kernel locks is really it.
The thing that remains to be seen is what set of applications they target, and what tradeoffs they make to support those applications. The fundamental issues in clustering have been addressed by a large number of research projects and products, and I'd like to know what's new about LCC.
That being said, I'm happy that some smart people are going after this problem!
As Matt Welsh noted, it is not exactly a trivial problem. If you look very closely at the article, the LCC wants to occupy a happy ground between the share-nothing crowd (Microsoft, Tandem) and the share-everything (Oracle). The share nothing pardigm is rather simplistic in its approach and reflects the fact that throwing together a bunch of machines with a cheap interconnect is a comparatively straight-forward re-engineering approach. The share-everything come froms the extension of shared-bus architectures (e.g. Sun Starfire) which enforces a multiple lock strategy. Companies like SGI have thrown million of R&D dollars into the middle-ground which is why their cc-NUMA architecture and cellular IRIX is quite popular. I wish the LCC luck but there is a reason why a successful working solution is expensive as it requires a savvy combination of hardware+software+smart routing (the SGI solution uses a cache directory). You are effectively paying for some very sophisticated know-how as part of every SGI machine.
Given the direction that SGI is heading (Linux for entry-level&apps + IRIX kernel extensions for high-end) I would wonder whether the LCC would produce anything practical in a realistic time-frame. This is not to decry their laudable efforts and I would hope businesses are patient enough to wait for robust and cheap solutions. If nothing else, it will hopefully offer a shardardised set of software extensions (a la OpenMP) and coding practices so that a single source tree can support 1 to n processors.
Who knows, they might be able to come up with a few tricks that the pros have missed.
LL
Hmm. I'm sort of surprised to see that Peter Braam's mentioned as the head of the Coda project. I bet Satya's even more surprised, though. See, he's actually the head of the Coda group. It wouldn't have been hard for ZDnet to figure this out; it says so right on the Coda group's web page.
I've been seeing mentions of Braam as "head of the Coda project" and "the man who created Coda" a lot recently, and it's starting to get annoying. Does nobody do any fact checking anymore?
Of course look at Apple three years ago:
Licensing of clones, the Newton, the eMate, etc. They were losing major money/resources and got rid of people. Their CEO left, and everyone thought they were going to die.
They rehired Steve Jobs, trimmed their products down to their core strengths, and are now worth more than they ever have been before.
So SGI, but spinning off and properly marketing their strengths(without tying them down to SGI) such as MIPS and Cray and their VisualPC stations, while focusing on their Irix high end supercomputing, and Linux on their low end desktop workstations, gives them a reasonable future. If they can focus on their core strengths and not waver or get distracted...
It's a perfect chance to buy their stock at 11 and (hopefully) see it go to 40!
-AS
-AS
*Pikachu*
I'd also be interested in hearing about any Free Software databases that can do this sort of synchronization. Thanks
Bruce
Bruce Perens.