Slashdot Mirror


How Well Does Windows Cluster?

cascadefx asks: "I work for a mid-sized mid-western university. One of our departments has started up a small Beowulf cluster research project that he hopes to grow over time. At the moment, the thing is incredibly weak... but it is running on old hardware and is basically used for dog and pony shows to get more funding and hopefully donations of higher-end systems. It runs Linux and works, it is just not anything to write home about. Here's the problem: my understanding is that an MS rep asked what it would take to get them to switch to a Microsoft cluster. Is this possible? Are there MS clusters that do what Beowulf clusters are capable of? I thought MS clusters were for load balancing, not computation... which is the hoped-for goal of this project. Can the Slashdot crowd offer some advice? If there are MS clusters, comparisons of the capabilities would be welcome." One has to only go as far as Microsoft's site to see its current attempt at clustering, but what is the real story. Have any of you had a chance to pit a Linux Beowulf cluster against one from Microsoft? How did they compare?

14 of 590 comments (clear)

  1. Re:Licensing by CodeMonky · · Score: 5, Informative

    Followup:
    From reading the MS Site it looks licensing is based of the EULA of the software being used, so if you are using win2kpro you have to have a copy of win2kpro for each machine etc etc.

    --
    --"Karma is justice without the satisfaction"
  2. Re:first post - no way by GreyPoopon · · Score: 4, Informative
    The windows was purely load balancing.

    From Microsoft's site: "The Computational Clustering Technical Preview (CCTP) toolkit is used for creating and evaluating computational clusters built on the Windows® 2000 operating system."

    Obviously, they are now attempting to compete with projects like Beowulf. It's probably all part of the M$ aggressive stance on Linux (and other competitors). The real question is, has anybody downloaded this kit and played with it. It's just a technology preview, so how mature is it in comparison to Beowulf or other clustering technologies?

    --

    GreyPoopon
    --
    Why is it I can write insightful comments but can't come up with a clever signature?

  3. Here's the deal: by Null_Packet · · Score: 5, Informative

    MCS (Microsoft Cluster Services) are designed for load balancing and fault tolerance, as where Beowulf Clusters (AFAIK) are more for distrubuted processing load for performance increases (massive threading). MCS works quite well, especially well on Fibre Channel and Brand Name Hardware such as Dells and Compaqs.

    Simply put, it works well (but the cost is often an issue due to the cost of hardware in an enterprise) but it is not the same clustering you see with the Unices. E-mail me at my account if you have more specific questions.

    My intent is not to start or participate in a flame war, but the term clustering simply implies different things on different OS'.

  4. MS Cluster is not the same by merlin_jim · · Score: 5, Informative

    Hello,

    We run a MS cluster here. VERY big app... so big, I am loathe to name figures, because that would identify to MS just who is talking here...

    But, we use MS clustering for our web app. Our setup is that we have a database server with 4 procs, and a growing array of web servers with 1 proc each, all of which use disk space on a SAN. W2K clustering manages the load balancing as well as allocating disk space out of the SAN to virtual partitions as needed. The original poster is correct; MS clustering is for load balancing, not computation. I have seen many times Microsoft sales reps don't have a clue of what they're trying to sell; they're just told from on high to replace Linux with Microsoft wherever they can. I think this is clearly a case of that.

    My advice? Ask the sales rep to demonstrate how MS clustering will solve a common comp-sci problem with more MIPS than each box alone has. Point out that you're not running a web server or any such service on these boxes, but that they're for raw computation. Even better, see if he'll let you talk to a technician on how W2K clustering can meet your 'unique' (at least to MS) needs.

    Now, for everyone else... Don't get me wrong. W2K clustering is a great technology for building highly performant, highly reliable, highly scalable applications quickly and easily. But it scales in the direction of millions of users, not millions of computations.

    --
    I am disrespectful to dirt! Can you see that I am serious?!
    1. Re:MS Cluster is not the same by crimoid · · Score: 5, Informative

      Apparently you (and most everyone else) didn't take the time to even look at the link provided. Microsoft DOES have computational clustering, not just "traditional" clustering.

      MS Computational Clustering

    2. Re:MS Cluster is not the same by merlin_jim · · Score: 5, Informative

      I must now put on the traditional monkey hat of shame, for the naysayers are quite correct. There are TWO microsoft products called clustering. One is used by Windows 2000 Advanced Server to do load balancing, and is, in fact, split into two parts, the first called Clustering, the second Network Load Balancing... see this page, which includes the statement "Both [of the Windows 2000 Advanced Server] Clustering technologies are backwards compatible with their Windows NT Server 4.0 predecessors". The other is High Performance Clustering (HPC), in its current form called Computational Clustering Technical Preview (CCTP), which I am certain has nothing to do with the previous Clustering technology... I doubt it was available for Windows NT 4.0, among other things (thus the Technical Preview status).

      Notes for any and all interested in this; it's a technical preview, which any other company would call a pre-Beta or an Alpha release. The only way anyone sane would use this in a production system would be as an Early Adoption Partner...

      --
      I am disrespectful to dirt! Can you see that I am serious?!
  5. Stability issues by The+Panther! · · Score: 5, Informative

    At my last job, we had a COW (Cluster of Workstations) running all sorts of operating systems. Except Windows. Why? Because they won't run in a production environment for more than a few days without freezing or crashing, and the system administrators refused to babysit them. With Windows 2000, I've had my home machine run for upwards of 28 days without a reboot, but only if all the video drivers are stable and the machine is not doing too much at any given point (say, burning cds while watching movies and keeping my net connection above 200k/s). But so help you if a driver freezes. There's no way to reset them. Your hardware will play into your decision as much as the operating system, I believe, due to stable driver support.

    In terms of performance, Windows kernels have pretty good latency compared to 2.2.x linux kernels, so running a full screen dos app might give very good performance, but there's a lot of overhead munching into your RAM, which is likely to be an expensive premium on older hardware.

    Lastly, with Windows, I've never heard of doing channel bonding for ethernet (3 100TX cards ~= 1 gigabit), nor diskless booting that I know of. These can be really necessary for large clusters to keep maintenance down and performance up without buying higher end equipment.

    --
    Any connection between your reality and mine is purely coincidental.
  6. First Hand Info by GeckoX · · Score: 4, Informative

    We researched MS Clustering very extensively. We're already an MS shop and even still it was cost prohibitive.

    Notes from experience:

    1) Clustering with Windows requires one of the following OS setups: Win2K Server WITH MS Application Center, OR Win2k Advanced Server. (Similarly with the XP platform)

    2) OS Licenses therefor will run between $1000-2000 _per-machine_!

    3) If you need Application center, which you likely will, you're talking (If I remember correctly) about another $1g per.

    4) Of course MS is just getting into this so don't expect it to be easy, well documented or stable.

    Finishing Notes:

    Obviously, Linux would be mucho cheaper

    Easiest, and still cheaper than MS would be the Plug-n-Play Mac solution!

    --
    No Comment.
  7. The OS doesn't matter - tools do by Oestergaard · · Score: 5, Informative


    For a computational cluster, the OS itself shouldn't really matter. What matters is, do you have the tools you need, and does the environment allow you to work with the cluster in a flexible way.

    For a typical compuatational cluster, what determines the performance will be the quality of your application. Only if you pick an OS with some extremely poor basic functionality (like, horribly slow networking), will the OS have an impact on performance.

    People optimize how their application is parallelized (eg. how well it scales to more nodes). The OS doesn't matter in this regard. They optimize how well the simple computational routines perform (like, optimizing an equation solver for the current CPU architecture) - again, the OS doesn't matter.

    So, in this light, you might as well run your cluster on Windows instead of Linux, or MacOS, or even DOS with a TCP/IP stack (if you don't need more thatn 640K ;)

    However, there's a lot more to cluster computing than just pressing "start". You need to look at how your software performs. You need to debug software on multiple nodes concurrently. You need to do all kinds of things that requires, that your environment and your tools will allow you to work on any node of the cluster, flexibly, as if that node was the box under your desk.

    And this is why people don't run MS clusters. Windows does not have proper tools for software development (*real* software development, like Fortran and C - VBScript hasn't really made it's way into anything resembling high performance (and god forbid it never will)).

    Furthermore, you cannot work with 10 windows boxes concurrently, like they were all sitting under your desk. Yes, I know terminal services exist, and they're nice if you're a system administrator, but they are *far* from being usable to run debuggers and tracing tools on a larger number of nodes, interactively and concurrently.

    Last but not least, there are no proper debugging and tracing tools for windows. Yes, they have a debugger, and third party vendors have debuggers too. But anyone who's been thru the drill on Linux (using strace, wc -l /proc/[pid]/maps, ...), and needed the same flexibility on windows, knows that there is a world of difference between what vendores can put in a GUI and what you can do when you have a system that was built for developers, by developers.

    So sure - for a dog&pony show, windows will perform similar to any other networked OS with regards to computational clusters. But for real-world use ? No, you need tools to work.

  8. use Application Server, not Clustering by spongman · · Score: 5, Informative
    Microsoft has a few types of clustering:
    1. Failover clustering. This is an OS service that servers like SQL Server and Exchange plug into that allows Active/Passive or Active/Active clustering over a shared SCSI/Fibre bus. In theory you could write your app to use this service but I think it would be overkill.
    2. Network Load Balancing. This is just a software version of the standard kinds of NLB found in cisco boxes.
    3. Component Load Balancing. This is the most suitable. It's provided by Application Center and it allows you to deploy COM+ objects on a cluster of machines and have the calls distributed according to the load on those machines. You can control the threading and lifetime of the objects and view the status of the machines pretty easily using the Application Center MMC plugin (or SNMP, I believe). You'd have to wrap the computational part of your application into one or more COM objects. Once you've done that then you can create and call those objects in the cluster as if it were one machine - the clustering is transparent to the client application. I played around with AC a bit when it was in beta for a project that I was working on. We didn't go with it in the end because the design of our application ended up not requiring it (we just went with hardware load balancing), but it seemed like pretty cool technology - if you're into the whole COM thing. It has a really cool rolling deployment feature where you can redeploy your components (and/or IIS application if you have one) to your cluster incrementally while it's still running.
    Here's some links to docs on MS's site:

    Introducing Windows 2000 Clustering Technologies
    Application Center home page
    Component Load Balancing

  9. About that Mac solution..... by jspaleta · · Score: 4, Informative

    about that mac solution....
    Yellow dog linux sells a cute little piece of hardware designed for clustering around PPC. very cute...maybe the best balance of cost effective and easy in terms of clustering that ive seen.

    http://www.terrasoftsolutions.com/products/briQ/ hp c.shtml

    -jef

  10. Re:Licensing by maitas · · Score: 4, Informative

    For raw MPP numeric processing, W2k is too dam slow. You can boot Linux in 4MB of RAM and less than 64MB of disk, then, just load the libraries you need and nothing else, and you will have a preety decent system. Try thining W2K down and you will have a huge problem there. You can use Sun's GridEngine for Linux (http://www.sun.com/software/gridware/gridengine_p roject.html) and best of all, it's open source!
    At the end, it all comes to your soft, if you develop a highly scalable, almost share nothing algorithm, Linux Clustering is the way to go. For fail-over Linux you have tha HA Linux project, once more, Open Source!

  11. MS clustering = bad mmkay? by JamesGreenhalgh · · Score: 4, Informative

    Having seen first hand how poorly the following setup ran, I'd say steer clear of Microsoft until they admit that reboots are not normal:

    2 x HP Netservers, both dual p2 Xeon, 1gb ram, and a small raid shelf with 8x 9gb disks. Both NT4 installs with the correct patchlevels.

    One machine ran oracle, the other IIS, these were clustered so that one would take over the task of the other, should there be a problem.

    Problems:
    1) Crashing (daily at least)
    2) Slow (astonishingly poor, disk defrags once a week helped this)
    3) Sometimes one host would freeze, and the other wouldn't actually notice
    4) Often a shutdown of one node would move the services across, but upon rejoining the cluster - the node with both services would refuse to give one back.
    5) Often, IIS would stop talking, and neither node would actually realise.

    The attempted solutions:

    1) Replaced CPUs, memory, disks, eventually nodes
    2) Reinstalled clustering software, eventually total clean installs of operating system and applications
    3) Support from Microsoft, and Oracle, and HP who made the (certified) kit. Oracle+HP both pointed the finger at the OS, Microsoft simply failed to help, when we got any response from them at all.
    4) (this helped) I used one of the spare HP9000 servers to monitor them remotely by trying test transactions - it alerted people when they fucked up.

    I think the above says it all really. Standard software on correct hardware - it just didn't work properly. Microsoft can stick their clustering "technologies" where the sun don't shine.

    --

    --
    ALL YOUR BASE ARE BELONG TO US!
  12. MS parallel tools by ajv · · Score: 5, Informative

    Getting past what are the wrong tools first: Beowulf is an architecture to do massively parallel computation, so we can eliminate two of the best known HA tools. Microsoft Cluster Service is two or four node high availability, similar to HA Linux's efforts. NLBS is a software form of a hardware load balancer, similar to Cisco Local Directors and only really good for web farms. So what does MS provide to do similar stuff as Beowulf?

    COM+ and Queueing Components. AppCenter.

    The way it works is this. You write a COM+ component that is transactionally queuing aware. Each component takes a work unit in, processes it, and then sends the result of the transaction to the queueing components for reassembly or re-issue (if a node fails to submit a result, for example, good for checkpointing).

    You can use normal Windows 2000 Professional boxes for the worker bees, and use a few Windows 2000 Server boxes to co-ordinate the issuing of jobs and control, and munging the result sets coming back in.

    If you need to submit a wide variety of jobs, obviously the COM+ components will be changing regularly, it'd be a good idea to go to AppCenter so that you can treat a bunch of machines as single whole. This allows you to upgrade or deploy an app in a few mouse clicks to literally thousands of machines in a few seconds. AppCenter also has pretty good resource management, something that might be necessary if multiple jobs are running at the same time.

    The cool thing is the development environment is really friendly and you can make COM+ components pretty easily and test them locally (for the n=1 case) before deploying to the farm.

    There are also specialist MP libraries for the Win32 platform, such as PVM or MPI (WMPI). These have the benefits of re-using the knowledge and API's that users might already be familiar with - one of the biggest thing when a place converts from one supercomputer to another is rejigging and reoptimizing the code for the new architecture.

    --
    Andrew van der Stock