Slashdot Mirror


Apple Wins VT in Cost. vs. Performance

danigiri writes "Detailed notes about a presentation at Virginia Tech are posted by by an attending student. copied most of the slides of the facts presentation and wrote down their comments. He wrote some insightful notes and info snippets, like the fact that Apple gave the cheapest deal of machines with chassis, beating Dell, IBM, HP. They are definitely going to use some in-house fault-tolerance software to prevent the odd memory-bit error on such a bunch of non-error-tolerant RAM and any other hard or soft glitches. The G5 cluster will be accepting first apps around-November." mfago adds, "Apple beat Dell, IBM and others based on Cost vs. Performance alone, and it will run Mac OS X because 'there is not enough support for Linux.'"

16 of 105 comments (clear)

  1. whew! by Anonymous Coward · · Score: 4, Informative

    3 MW power, double redundant with backups - UPS and diesel

    1.5 MW reserved for the TCF

    2+ million BTUs of cooling capacity using Liebert's extreme density cooling (rack mounted cooling via liquid refrigerant)

    traditional methods [fans] would have produced windspeeds of 60+ MPH

  2. Infiniband insured latency? by dhall · · Score: 5, Interesting

    One of the primary concerns for a multi-node cluster is insured latency among all components within the cluster. It doesn't have to be the fastest, it just needs to insured exacting timing for latency across all nodes. IBM can do this with their "wormhole" switch routing on SP and has done this with Myranet on their Intel X-series clusters.

    From most of my reading with Infiniband, it was designed from the ground up as a NAS style solution, than for large multi-node cluster computing. I'm curious as to if they have any issues with cluster latency.

    http://www.nwfusion.com/news/2002/1211sandia.htm l

    The primary timings and white papers I've seen published for Infiniband have been for small clustered filesystem access. Although it's burst rate is much higher than Myranet, it's hard to find any raw retails for their multiple node latency normalization.

    I hope it scales, since Intel's solution appears to be less cost prohibitive than some of the other solutions offered on the market, and would really open up the market even for smaller clusters (16-36 node) for business use.

  3. Re:Power concerns by ni4882 · · Score: 5, Informative
    Actually, from the article:

    # 3 MW power, double redundant with backups - UPS and diesel * 1.5 MW reserved for the TCF

    # 2+ million BTUs of cooling capacity using Liebert's extreme density cooling (rack mounted cooling via liquid refrigerant) * traditional methods [fans] would have produced windspeeds of 60+ MPH

    Seems that they did talk about both.

  4. ECC FUD by J0ey4 · · Score: 5, Informative

    Okay before we get going with the same discussion about ECC vs. Non ECC, and all the flames start from people perusing slashdot who think they are more in the know than the PhD's at VT who have been working on this for months I want to point a few things out.

    1. The majority if not all of the bit errors that ECC corrects are caused by thermal noise. Thermal noise is an issue in a cluster of rack mounted 1U units due to the difficulty of cooling such tightly spaced units generating so much heat in so small a space. It is not an issue in a cluster of DESKTOP machines utilizing a Liebert system with way more cooling capacity than is needed.

    2. Even if somehow a none-thermal bit error occurs, each node has 4GB RAM. The probability of it being in an OS or application critical (especially given the converging nature of many long running calculations) piece of RAM as opposed to an empty piece of RAM is small.

    How many of you are reading this from a desktop without ECC RAM that has an obnoxiously huge uptime? ECC is a non-issue in a well-cooled cluster of desktop cased machines.

    1. Re:ECC FUD by Anonymous Coward · · Score: 4, Interesting

      2. Even if somehow a none-thermal bit error occurs, each node has 4GB RAM. The probability of it being in an OS or application critical (especially given the converging nature of many long running calculations) piece of RAM as opposed to an empty piece of RAM is small.

      Think before you post. The failure rate is constant in each memory chip (actually it goes up a bit with higher capacity due to higher density). Unless you setup the memory to be redundant (which the G5 can't do either...) you will experience MORE errors since a good OS tries to use the empty memory for things like file buffers.

      How many of you are reading this from a desktop without ECC RAM that has an obnoxiously huge uptime? ECC is a non-issue in a well-cooled cluster of desktop cased machines.

      Sigh... this is a 2200-cpu *cluster*. Here's a primer on statistics. Assume the probabiliy of a memory error is 0.01% for some time interval (say a week or month). The likelyhood for a perfect run is then 99.99% on your single CPU, which is just fine. Running on 2200 CPUs, the probability of not having any errors is 0.9999^2200=0.8, or 20% probability of getting memory-related errors somewhere in the cluster.

      The actual numbers aren't important - it might very well be 0.01% probablility for an error per year, but the point is that when you run things in parallel the chance of getting a memory error *somewhere* is suddenly far from negligible.

      ECC is a cheap and effective solution that almost eliminates the problem. Incidentally, one of the challenges for IBM with "Blue Gene" is that with their super-high memory density even normal single-bit ECC might not be enough.

      But, what do I know - I've only got a PhD from Stanford and not VT....

  5. Dude... by yoshi1013 · · Score: 5, Funny
    At this point all I really want to know is what the hell does 1100 G5s look like???

    Certain things are easy to imagine in large quantities, but dude.

    Just....dude....

  6. An interesting tidbit by BortQ · · Score: 4, Interesting
    The very last slide states that
    Current facility will be followed with a second in 2006
    It will be very interesting to see if they also use macs for any followup cluster. If it works out well this could be the start of a macintosh push into clustered supercomputers.
    --

    A Multiplayer Strategy Game for Mac OS X, Windows, and Linux
    1. Re:An interesting tidbit by eweu · · Score: 4, Funny

      I caught that too. Use of Macs in 2006 no doubt depends on 2 factors: 1) how well the 2003 cluster works out, and 2) how the Mac compares to competitors in 2006. Could be a nice win for Apple, again, if they manage to keep both 1 and 2 competitive. Which remains to be seen, and I'm holding my breath.

      I don't know. Holding your breath until 2006 sounds... dangerous.

  7. Re:Clueless Sysadmins... by selderrr · · Score: 4, Insightful

    for any represantation, you need only 1 graphics card : the one the monitor is attached to. Parallelizing realtime display-only stuff is not much good since you'd lose to much time in data transmission.

    So they could equip one G5 with a radeon9800 and let that one display the results. No need to buy another 1099 Radeons.

  8. Re:water cooled laptops as blades by Johnny+Mnemonic · · Score: 4, Informative


    The viriginia folks must have one huge room with some massive air handlers to circulate the air that will be trapped behind the towering walls of 1000 4U boxes.

    I don't know any more than what's publicly availble, but the VT follks in the know have said that they've designed a specialized, liquid based cooling system precisely because of the issues wrt cooling this many units. The FA makes reference to this many units generating windspeeds of 60mph from fans alone.

    I am gonna guess that behind each G5 rack will be a radiator type arrangement, with cooled pipes flowing with a liquid that will carry the heat away from the internal airspace, much like a large car radiator. I don't know if that would be cost-effective, or what it would take to move that much liquid, or if the radiator could be made to transfer enough heat fast enough. Maybe the liquid cooling units actually replace the internal fans directly. Who knows--I think we'll get some more details on this this week as the G5s start to come out of their boxes. They've apparently received about 10% of them already.

    --

    --
    $tar -xvf .sig.tar
  9. Free printer and Ipod case by goombah99 · · Score: 4, Funny

    Man they really blew it. They should have ordered it from macmall. it would have come with 1000 free printers and 1000 ipod cases.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  10. G5's cheaper than VTs? by dpbsmith · · Score: 4, Funny

    I believe you can get a VT for well under $1000, and I've even heard that some of them now support advanced "sixel" graphics.

    And they scroll MUCH more smoothly than OS X.

  11. Re:For those in the know by confused+one · · Score: 4, Insightful

    Why go to the trouble of porting linux to the G5 when you could port the clustering code to OS X and be done with it. Seems like a much simpler task and more cost effective use of labor.

  12. Re:neat. by confused+one · · Score: 4, Informative

    Read on. They're putting 8GB of RAM in each machine.

  13. Apple Outshines Dell on Ethics by reporter · · Score: 4, Interesting
    Even if Apple computers were to cost slightly more than Dell computers, we should consistly buy the former instead of the latter. Price is only 1 aspect of any product. There are also ethical considerations. They do not matter much outside of Western society, but they matter a great deal in Western society.

    As an American company, Dell is a huge disgrace. Please read the "Environmental Report Card" produced by the Silicon Valley Toxics Coalition. Dell received a failing grade and is little better than Taiwanese companies, which are notorious for destroying the environment and the health of workers. Dell even resorted to prison labor to implement its pathetic recycling program.

    ... from the desk of the reporter

  14. Re:Cost Analysis by 2nd+Post! · · Score: 4, Insightful

    I bet at the time of initial consideration of vendors, there were no competitive Opteron or Itanium solutions (none with chassis, the slides say), and I am also willing to bet that Apple had at least a hardware prototype they could demonstrate, at least a motherboard + dual CPU setup, even if the chassis was incomplete and the not all the major subsystems were 100%

    Just enough to demonstrate that Apple *would* have a solution, and enough that VT could narrow down the decision to a possible, pending the actual production and purchase of a single machine... then, the contract being 99% complete, they just had to sign a couple papers and purchase, overnight, 1,100 dual G5s.

    On the flip side I bet they had a similar contract in the wings with other vendors, all pending on 'simple' bottlenecks.