Slashdot Mirror


Where to Spend $1M on a Cluster?

Natchswing asks: "My university has been given a $757,825 NSF grant to build, 'A 256 node (128 pair) Beowulf parallel computing cluster ... to improve the realism of gravity-wave modeling by permitting treatment of the three dimensional problem and multiple wave interactions.' They want to pay a company to just show up and drop off a functional cluster rather than build it themselves. Since word has leaked out regarding the purchase intent, every computer manufacturer under the sun (including Apollo himself) has called up trying to sell their cluster. Since I'm no cluster expert, I'm writing Slashdot. If you had $0.7 mil to buy a pre-built cluster who would you go with and why?"

10 of 104 comments (clear)

  1. Penguin Computing by retostamm · · Score: 3, Informative

    Penguin Computing does this kind of stuff for a living. I think they are an all open source shop, too... There may be others, too.

    1. Re:Penguin Computing by DA-MAN · · Score: 3, Informative

      Penguin Computing does this kind of stuff for a living. I think they are an all open source shop, too... There may be others, too.

      As a Systems Engineer who has worked with a number of vendors, I would say that Penguin is the bottom of the barrel in service and quality control.

      We have five clusters at our facility, the slowest of which is on the top500 in the 150 range. We've tried big and small vendors.

      Penguin is the absolute worst. No two scsi hard disks had the same firmware version, the raid controller was DOA, etc. We buy/borrow a node from each vendor and evaluate them before buying clusters, and out of all the vendors the Penguin is the one that would crash or hang all the time. After months of trying, they were never able to get this going properly. Regardless of the fact that we shipped it back twice and were told each time that we'd get back a whole new machine (it wasn't).

      I would personally recommend Appro, IBM or Western Scientific in that order. Service and quality hardware are their game.

      --
      Can I get an eye poke?
      Dog House Forum
  2. Re:Do it with an apple by coldcup · · Score: 5, Informative

    A cluster of storage? Perhaps you mean the Xserve itself.

    They even have a page on clusters.

  3. Microway by brsmith4 · · Score: 5, Informative

    I run a 48 Node Microway beowulf and I must say that it is the most stable system available. Everything came assembled and ready to go (of course, I built the enclosure and did the networking, but they will do that for you if you'd like). If you're not very knowledgeable about beowulfs, how do you know you'll need so much power? Do you know how well the software you will be using will scale? Is it close to embarassingly parallel or does it lose efficiency over X nuber of nodes? What type of resources and consumption does the program use? Is it extremely processor hungry, or does it deal with dense matrices and require low-memory latency and high bandwidth or both? Do you know if you will need the power of Myranet or will you be able to get by on GigE?

    These are important questions you must ask your researchers and yourself before you purchase this cluster. But, to answer your question, I believe Microway is the best choice and I plan on having them build our next cluster in the next fiscal year.

    -brian

  4. Not Angstrom by Anonymous Coward · · Score: 4, Informative

    I currently maintain some Opteron based Angstrom Microsystems Linux clusters. We've had them for less than a year, and already 30% of our nodes have had to be replaced. Support has been a nightmare.

    Sadly, I was not around when the proposal was made, otherwise I would have rejected this cluster outright. There is no way to hook external storage up to this beast. There is no USB, Firewire, SCSI, external SATA, or fibre channel options. You can't even run an ATA cable out of the thing without drilling holes into the blade walls.

    Personally? I'm looking at an XServe or an IBM Bladecenter.. but maybe it's just because I'd like some real support.

  5. Re:cluster experience by Robbat2 · · Score: 2, Informative

    furthermore, make SURE you have sufficent physical space and airconditioning capacity for your new cluster.

    --
    ICQ# : 30269588
    "I used to be an idealist, but I got mugged by reality."
  6. cluster problem set by Raleel · · Score: 2, Informative

    I see you mentioned the problem set, which is good. to me and my only somewhat novice mind (I work with scientists all day, hear all kinds of stuff), this sounds suspiciously like a fine grained problem. that is to say, there will be a lot of interprocess communication, so don't skimp on the network. I'm not talking "get gigE". I'm talking "look at myrinet, or quadrix, or infiniband".

    Most people can do you up a 256 node cluster for under half a million, but doing up one with high speed and low latency network is another story. that net costs bucks, around $1500 per machine (for a card, a cable, and a port on the switch).

    Make sure you know your problem. if you understand how it works, then you can buy a cluster that meets the need much better. Make sure the nodes are not being starved for ram iif the problem is a ram hungry one (your researchers should be able to tell you, even from data off a single machine). Find out if it's heavily integer based or floating point based (my guess is that it's a floating point problem). Find out if it's a lot of vector and matrix manipulations.

    every machine type is a little better at something than the others. for instance, on integer based problems, x86 will generally scorch everything else. on floating point, there is lots of good competition (apple, intel's itanium, opterons). Don't be afraid to say to the vendor "we want to run on it for a week before we buy".

    as others have said, do a RFP, but get their specs. get your tech guys to look it over hard. ask them "what sucks".

    All that having been said, i'm a fan of apples and of verari systems. Dell is also quite good.

    --
    -- Who is the bigger fool? The fool or the fool who follows him? --
  7. Re:You got a grant with NO PLAN? by Natchswing · · Score: 2, Informative

    Actually, yes. On top of that they plan on paying some company to fly out harddrives for obnoxious prices rather than pay grad students with far more experience doing such things.

  8. Re:RFP is the answer by AndyRobinson · · Score: 2, Informative
    I think that's pretty much right. Two suggestions though...

    Firstly, the more you put into the process the more you'll get out of it so be prepared to come up with a good RFP. If you're not an expert in clusters then you might well not know the answers to some of these questions so be prepared to take advice from suppliers. Sure, some of them may try and rip you off but most will be honest and helpful which will make the dodgy ones pretty easy to spot. Alternatively, look for some external, independent help to work with you on both writing the RFP and selecting a supplier.

    Secondly, once you've got an RFP don't send to every company you can find. Pick a few - say 5 or 6 - good ones, send it to them, and then be prepared to spend some considerable time talking to them and answering their questions. You'll get much better responses that way. Alternatively, have short, very initial discussions with a larger number and then reduce that down to a short list as early on as you can.

    Part of my job involves responding to RFPs. We're usually pretty busy so we have to prioritise which RFPs we respond to and how much time we put into the response.

    The ones we put most effort into are, quite frankly, the ones which we think we stand a good chance of winning. Those are usually the ones which the client has done their homework on and come up with a good spec of what they want to achieve (but not necessarily how they want to achieve it), and done a reasonable amount of pre-selection of suppliers before expecting them to invest lots of time responding.

    The ones tend to politely decline are those that have been sent to everyone and his dog as, from experience, it suggests that the client doesn't really know what they're after, doesn't really know how to judge between suppliers, and/or isn't really bothered about who they choose and will just go with the lowest bidder.

    Having said all that, though, I work in a different field so some of this might not apply. On the whole, though, it's worked when I've been commissioning work and finding suppliers so I think the basic principles work!

  9. Re:RFP is the answer by Stinking+Pig · · Score: 2, Informative

    I've been privileged to answer a lot of RFPs in my career, so here's some tips from the other side to make the process go a little smoother:

    Corporate background questions are fine, but please stick to general stuff that can be answered with boilerplate. No one at the vendor knows or cares where our executive team went to college, and it's going to be a huge PITA to track that sort of BS down.

    Ask what you want to know, but please re-read the RFP when you're done writing it. If you've asked the same question 50 times in different wording, I'm going to answer it once and paste the same answer 49 times. That's not helping anyone.

    Do not use forms, whether document or web based. It makes it very difficult for us to check our work and makes it impossible for us to provide supplemental information.

    Do give a schedule with a reasonable amount of time. I'd release to vendors, wait a week, do a bidder's conference call, wait a week, and then collect responses. If it's taking them longer to respond, then they're either too strapped for resources or too far from their core competency.

    And for your own protection, here's some stuff to look out for:

    If they can't reveal processes for "security reasons" it very very probably means that they don't have any process. Run screaming.

    If a vendor is grossly more or less expensive, find out why. They might have good reasons, or they might not know what they're doing.

    --
    "Nothing was broken, and it's been fixed." -- Jon Carroll