Windows Compute Cluster Server 2003 Released
grammar fascist writes "According to an Information Week article, on Friday Microsoft released Windows Compute Cluster Server 2003." From the article: "The software is Microsoft's first to run parallel HPC applications aimed at users working on complex computations... 'High-performance computing technology holds great potential for expanding opportunities... but until now it has been too expensive and too difficult for many people to use effectively,' said Bob Muglia, senior vice president of [Microsoft's] Server and Tools Business unit, in a statement."
and what about the site licence needed for this baby, huh? For us mere basement-cluster builders, there is a cheaper, open source alternative: The OSCAR Project ( Open Source Cluster Application Resources). Yes, it runs on Linux, but it is a nearly step-by-step system of setting up HPC-level clusters. It is being used on many 100+ CPU High Performance Clusters around the world, and it is free without those pesky site licences.
"...but until now it has been too expensive and too difficult for many people to use effectively..." According to their licensing model EVERY machine costs 469 dollars... Meaning a 20 machine cluster would have a 10,000 dollar overhead just on the OS alone. Not to mention the fact that you'd be compelled to buy it again as Longhorn Cluster Ed. in just a couple of years... It seems like a little work setting up a free OS cluster would be a vastly preferable option, is there really any need or reason for this (at this cost anyway)?
And the 86 has nothing to do with transistors on a chip (the 8086 had 29,000 transistors). It was just the part number of the first in a line of processors (8086 then followed by the 80186, 80286, i386, i486, etc...).
You can build an HPC from random PCs but it will be crap because the PC to PC interconnects will be too slow. Real HPC needs highspeed, low latency internal interconnects and these are expensive. But I fail to see how paying a "Windows" tax will make matters cheaper, or easier.
Not to many are using Fedora or Slackware on some white box with parts from Best Buy to do HPC. They have been altered to specifically run on hardware that was made specifically for this, and even then management of it is not exactly simple. Not that I believe that 2003 Server will suddenly change that but just using Linux somewhere does not automatically make it the cheapest way.
And I believe the correct answer to your question is Traditionally it has been done by tuned versions of commercial Unices which added to the base cost of the OS over and above the very expensive custom built hardware. Recently Linux has become able to do many of these tasks by similarly being modified at a significant cost running on the same expensive custom hardware. The recent HPC installation using mostly off the shelf parts (they didn't use Ethernet) was the one at Virginia Tech and that ran OS X, not Linux.
"I use a Mac because I'm just better than you are."
You can still have real HPC with slow interconnects. It all depends on the application for the HPC. If your data has a high scatter rate that requires large amounts of data transfer all the time, then you need fast interconnects. On the other hand, if your data can be sent off to a node to be crunched on for 2 hours, then a bog standard gigabit ethernet interconnect will do you just fine.
Not to many are using Fedora or Slackware on some white box with parts from Best Buy to do HPC. They have been altered to specifically run on hardware that was made specifically for this, and even then management of it is not exactly simple. Not that I believe that 2003 Server will suddenly change that but just using Linux somewhere does not automatically make it the cheapest way.
The "standard" cluster these days is standard rack servers from a reputable vendor, along with a Linux distro tailor-made for cluster usage such as Rocks or OSCAR. Typically the only nonstandard hw, if any, is a high-speed network (Infiniband, Quadrics etc.).
And I believe the correct answer to your question is Traditionally it has been done by tuned versions of commercial Unices which added to the base cost of the OS over and above the very expensive custom built hardware.
Perhaps in the mid-1990'ies yes..
Recently Linux has become able to do many of these tasks by similarly being modified at a significant cost running on the same expensive custom hardware.
No. With the exception of the high-speed network card I mentioned above, the rest of the hw and sw are bog-standard. Of course there are exceptions, e.g. SGI, Cray, NEC, IBM etc. but then we're talking "real supercomputers" and not commodity clusters (the market MS is aiming at).
The recent HPC installation using mostly off the shelf parts (they didn't use Ethernet) was the one at Virginia Tech and that ran OS X, not Linux.
Not to piss on OSX, but Mac clusters are probably outnumbered 100:1 by Linux clusters.
Not that I disagree with you on this topic, but your post is almost word for word what the industry said of Microsoft when it entered the Server market competing against Novell.
Microsoft was considered to be the 'me to servers' of the time, yet as it turns out the MS servers 'did' offer features that the Novell servers of the time didn't and application servers progressed to the point that MS kicked Novell's butt.
The push for application servers also opened the door for *nixes to enter back into the mainstream 'server' environments, as Novell was a pretty closed Server technology and applications running on the Novell server were a joke.
So if history repeats, don't be surprisd if MS does have an ace up its sleeve and its approach to the clustered server model using that ace and companies do find real advantages when using the Microsoft concepts.
However, even if MS does have an Ace, it would be kind of nice to see the technology envelope challenged, and see this back progression into other OSes and *nixes.
I guess it is the same old story, never underestimate MS...
Ok, this isn't exactly going to be a hit with the scientific HPC community who already have all the clustering software that they need. But, think about MS's best customers, corporations. Imagine a new scheduling module for an ERP. If the model is complex enough, and if it has enough components and rules, it can easily become a major burden for a single server. And no, database clustering isn't necessarily the same -- not everything can be coded as a SQL statement, and even if they can, it isn't necessarily a smart way to apply a particular algo to a set of data. A Microsoft Windows based HPC unit would be perfect for the independent software vendor to use to power their new module -- assuming of course that the ERP itself runs on Windows. Odds are good that at least the client-side application is Windows compatible.
Difficulty, therefore, is NOT a significant factor in all of this. Ok, what about expense? Well, you're right that Linux is free. So is OpenMOSIX, OpenMPI (and many other MPI implementations), PVM (another messaging library), Lustre (a very high-performance network file system), many scientific and mathematical applications for clusters, etc. There are clustering patches for PoVRay, and it's always possible to write a script to have multiple machines render parts of images anyway. I'm sure there are other applications out there that I'm not thinking of right now, and it's only a matter of time before more "mundane" applications can take advantage of clustered environments. They already do, on Plan 9, to some degree. Oh, Plan 9 is also free.
Cost would appear not to be a major problem either, then. Optimizing is the only thing that is in any way difficult, and a GUI system that doesn't let you get to the really fine detail won't help there. More time, effort and money is spent on optimizing than on anything else, and I simply can't see any possible way that an OS that is designed for ease-of-use by hiding the intricacies can in any way help in that.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Ummm...I know of several clusters on our campus (VT) that are made of white boxes running Fedora, Gentoo, or Suse. One is a 200 node (400 CPU) Opteron cluster with a Myrinet interconnect named Anantham, and built by Dr. Varadarajan's graduate students. There are other smaller clusters ( 16 - 32 nodes ) of various design that are running on GigE. All of them built of white boxes and other off-the-shelf components ordered from mail-order companies. In the case of Anantham, all parts were ordered separately, i.e., RAM, motherboards, processors, cases, etc., and the system integration was done onsite. So, when you say, "Not to [sic] many are using Fedora or Slackware on some white box with parts from Best Buy to do HPC," I'm guessing you are referring to those in the TOP500 List? If so, yes, there aren't many that submit to the TOP500 List (from large sites) that are using a non-commercial version of Linux, i.e., RHEL. Many of the larger sites are going through first tier vendors (Dell, HP, IBM) for a turnkey cluster solution, but they are paying a premium for those systems for the sake of time to production. They could just as well buy white boxes, but they would be spending a great deal of their own time weeding out problem nodes and components that could be better spent on doing science and supporting users. Academia can afford to take the time to do this, DOE labs cannot, although, that paradigm is quickly shifting as academic budgets tighten and competition in the Computational Science and Engineering arenas heats up among research institutions.
Clusters (the topic of this original post) are not "traditionally [...] done by tuned versions of commercial Unices [sic]". Clusters are traditionally built with off-the-shelf components with Linux and specialized APIs and drivers for the interconnect being used. If you want to talk about HPC BEFORE 1998, then you are looking at large monolithic systems of a custom built nature.
System X does have a GigE network, but it is primarily used for management and job startup within the cluster. We have had a few users with specific MPICH2 needs that have used the GigE network for message passing, but the GigE network was not designed for that task. Our primary communication fabric is IB. We are currently running Mac OS X (10.3.9) on the system, but are evaluating alternatives.
I think MS has a lot more in-roads to technical computing that slashdotters realize.
While working at NASA Ames, I found a lot of technical work on Windows by people who are mechanical, electrical, and aerospace engineers. Applications included: structural/mechanical CAD for fabrication, programs to drive test and measurement equipment, MATLAB to derive simulation models, VxWorks embedded RTOS development.
The universal applications that I saw every engineer use were MS Word and PowerPoint. (LaTeX and troff are largely a lost art; even FrameMaker if used by engineers was running on Windows.) This meant that virtually every serious engineer had a Windows box, and possibly a UNIX box for specialized work as well. Most of these guys understand Laplace transforms, but not regular expressions. They were far more likely to use MATLAB than Perl.
There are, of course, serious software developers there who don't know Laplace transforms, and understand UNIX tools and open source. But this is a different crowd from engineers doing analytical work. (NASA Ames also has a serious supercomputing operation, e.g., the Columbia cluster of 10,240 Linux nodes built by SGI. When you have flow models like protrusions between thermal tiles, this is where you go.)
With budgets really tight and reductions in headcount, they stretch dollars as far as they will go; which means, if it will run on Windows, it's hard to justify another box. Furthermore, system support is outsourced to a department whose sole purpose is to keep computing alive, which means a very limited number of Windows and Mac OS configurations.
Having interacted with engineers involved in aerospace in other parts of the United States, the stuff I saw at NASA Ames seems pretty typical. (When I mentioned this to a Hubble astronomer, he was completely stunned.)
Now I don't expect Windows Compute Clusters in NASA anytime soon. But some engineering software vendor is going to decide that they can extend their product line by bringing compute power to the individual engineer through this mechanism. At that point, a hybrid solution of Windows desktop and Linux compute servers is going to be hard to justify, particularly if it requires additional department resource to make it work.
That's a pretty simplistic definition of a grid. A grid is typically a pool of resources that is not under a single administrative domain, which is transparently accessible as a utility. Resources on the grid can be clusters, file systems, single machines, etc. I think you're thinking that this would be a distributed application, where everything is under a single administrative domain.
A cluster is going to be managed as a single machine, true. But you're not necessarily even requiring communication to occur at all for different processes of a job on a cluster. You're basically saying that if you put embarrassingly parallel jobs on the cluster, it's not a cluster anymore.
The term cluster is seen more as an administrative characteristic than on what actual hardware is going into it. As long as you've got everything in one administrative domain, the computers are able to communicate with each other, and these computers are solely dedicated to doing jobs as given by the job scheduler, it's a cluster. Bonus points if there's a fast interconnect. However, there's nothing stopping anyone from submitting N single-process jobs to the N-node cluster, or an embarrassingly parallel job that only communicates at the beginning and end.