Slashdot Mirror


Ask Slashdot: Parallel Cluster In a Box?

QuantumMist writes "I'm helping someone with accelerating an embarrassingly parallel application. What's the best way to spend $10K to $15K to receive the maximum number of simultaneous threads of execution? The focus is on threads of execution as memory requirements are decently low e.g. ~512MB in memory at any given time (maybe up to 2 to 3X that at the very high end). I've looked at the latest Tesla card, as well as the four Teslas in a box solutions, and am having trouble justifying the markup for what's essentially 'double precision FP being enabled, some heat improvements, and ECC which actually decreases available memory (I recognize ECC's advantages though).' Spending close to $11K for the four Teslas in a 1U setup seems to be the only solution at this time. I was thinking that GTX cards can be replaced for a fraction of the cost, so should I just stuff four or more of them in a box? Note, they don't have to pay the power/cooling bill. Amazon is too expensive for this level of performance, so can't go cloud via EC2. Any parallel architectures out there at this price point, even for $5K more? Any good manycore offerings that I've missed? e.g. somebody who can stuff a ton of ARM or other CPUs/GPUs in a server (cluster in a box)? It would be great if this could be easily addressed via a PCI or other standard interface. Should I just stuff four GTX cards in a server and replace them as they die from heat? Any creative solutions out there? Thanks for any thoughts!"

4 of 205 comments (clear)

  1. can you write GPU code? by zeldor · · Score: 5, Insightful

    do you or them know how to program on a GPU?
    if its really embarrassingly parallel EC2 spot instances and the gnu program 'parallel' will work quite nicely.
    But if coding changes are required then the hardware is the least of your expenses.
     

    --
    If I could walk that way I wouldnt need cologne.
  2. commodity HPC depends on your code by Haven · · Score: 5, Informative

    In HPC we call it "pleasantly parallel," nothing is embarrassing about it! =]

    If your code:
    -scales to OpenCL/CUDA easily.
    -does not require high concurrent memory transfers
    -is fault tolerant (ie a failed card doesn't hose a whole day/week of runs)
    -can use single precision flops

    Then you can use commodity hardware like the gtx series cards. I'd go with the gtx 560ti (GF114 gpu).

    Make nodes with:
    quad core processors (amd or intel)
    whatever ram is needed (8GB minimum)
    2 x gtx560ti (448) run in SLI (or the 560ti dual from EVGA)

    Basically a scaled down Cray XK6 node. http://www.cray.com/Assets/PDF/products/xk/CrayXK6Brochure.pdf

    It all depends on your code.

  3. Re:PS3 by Anonymous Coward · · Score: 5, Informative

    PlayStation 3s have proved a cost efficient way of setting up large scale parallel processing systems. Of course you'll have to find your way around Sony's blocks on the OtherOS system, and you'll need to keep it off the internet or firewalled in some way, but you essentially get cheap processing subsidised by the games that you don't need to buy.

    Back-of-the-envelope comparison of PS3 and GTX:

    A cluster of three PS3s: 920 GFLOPS. Price: about $800.

    A PC with 3 GTX 460 cards: 2200 GFLOPS. Price: about $800.

    Each of those GTX cards also has significantly more memory than the PS3, and are cheaper to develop for.

  4. Beowulf cluster! by TWX · · Score: 5, Funny

    Why not a beowulf clust---

    I'm sorry, I just can't. I searched the ~35 posts, browsing at -1, and no reference to a Beowulf cluster anywhere, let alone Natalie Portman or Grits.

    Slashdot! You're slipping! I lament the days when even our trolls were amusing and somewhat topical to the discussion at hand! We've fallen so far!

    --
    Do not look into laser with remaining eye.