Slashdot Mirror


Factual 'Big Mac' Results

danigiri writes "Finally Varadarajan has put some hard facts on the speed of the VT 'Big Mac' G5 cluster. Undoubtedly after some weeks of tuning and optimization, the home-brewn supercluster is happily rolling around at 9.555 TFlops in LINPACK. The revelations were made by the parallel computing voodoo master himself at the O'Reilly Mac OS X conference. It seems they are expecting and additional 10% speed boost after some more tweaking. Srinidhi received standing ovations from the audience. Wired news is also running a cool news piece on it. Lots of juicy technical and cost details not revealed before. Myth dispelling redux: yes, VT paid full price, yes, it's running Mac OS X Jaguar (soon Panther), yes, errors in RAM are accounted for, Varadarajan was not an Apple fanboy in the least... read the articles for more booze."

79 of 566 comments (clear)

  1. FACT: by Anonymous Coward · · Score: 2, Funny

    Big Macs are bad for your health.

    1. Re:FACT: by beautiful_idiot · · Score: 4, Funny

      better than WOPRs.

  2. Take 12492342... by devphaeton · · Score: 4, Funny

    ....ok, we've really got real numbers THIS time!!

    --


    do() || do_not(); // try();
  3. Quite an accomplishment. by illuminata · · Score: 2, Funny

    I haven't seen a cluster of Macs this big and powerful since the last annual pimp convetion!

    Now, where did all the tricks go?

    --


    Until Slashdot fixes the funny modifier, use insightful or interesting. The poster knows your intentions.
  4. Brewn? by FatAlb3rt · · Score: 2, Interesting

    Is that a word? How about brewed? Hate to nit, but .... aw... nevermind.

    1. Re:Brewn? by dipipanone · · Score: 3, Funny

      I strive daily to better master Shakespeare's and Snoop Doggy Dog's language

      Ah. In that case, the word you were looking for was 'brizzled', MizzutherFizzucker.

      Hope this helps.

  5. Re:Full price by McAddress · · Score: 2, Insightful
    RTFA

    The x86 cluster would have been twice as expensive. And this outpreforms the highest ranking x86 cluster, which has more processors.

  6. Super computer? by ludky132 · · Score: 3, Insightful

    I've always been sort of intrigued by Top500 Has there ever been a good comparison written about the similarities/differences between a 'supercomputer' and the regular pc sitting on my desk running Linux/2k? At what point does the computer in question earn the title "Super"?

    1. Re:Super computer? by Carnildo · · Score: 2, Interesting

      A "supercomputer" is usually one that is optimized for vector operations: operations that take a data set, and perform the same operation on each element of that data set -- sort of a "Super SIMD/SSE/AltiVec/whatever". Your desktop computer is designed around performing a series of different operations on a single data element at a time. The graphics card of your computer could be considered a very specialized supercomputer.

      In terms of raw processing power, the computer on your desk is more powerful than an early Cray. But if you tried to do weather modelling or finite element analysis with both, the Cray would win.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    2. Re:Super computer? by isoga · · Score: 5, Funny

      When you get on the list. Then you have a supercomputer

  7. Re:Full price by MORTAR_COMBAT! · · Score: 4, Funny

    The power usage (think cooling the room) for a similarly-performing Athlon cluster would likely more than make up for what phantom price difference you are talking about.

    --
    MORTAR COMBAT!
  8. Full Price? WHY?!? by JonTurner · · Score: 5, Insightful

    >>yes, VT paid full price

    This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
    Shameful.

    1. Re:Full Price? WHY?!? by sammy+baby · · Score: 3, Insightful

      I agree. As an employee of a state-run university, I can attest that I'm elligible for a 10% discount off the purchase price of one of the dual 2GHZ G5s. (Originally $2999, discounted to $2699).

      That VT wasn't able - or didn't think - to do the same is pretty shocking. A savings of $330,000 isn't anything to sneeze at.

    2. Re:Full Price? WHY?!? by zeno_2 · · Score: 4, Informative

      Derek Bastille of the Arctic Region Supercomputing Center in Fairbanks said that they just built a supercomputer but spent about 30 million using Cray and IBM equipment. He got quotes from other companies (dell) and the price was going to be about 10 million. They only ended up spending 5.2 million on the apples. Id say if I lived in Virginia, and paid taxes, I would be happy.

    3. Re:Full Price? WHY?!? by Patrick+Lewis · · Score: 3, Informative

      I imagine that it was because the G5s were very scarce at launch. These aren't loaves of bread we are talking about. Apple could ship and sell at full price as many of these as they could make, so VT really had no leverage to try and get lower prices. 3-6 months after the launch, then sure, they might be able to get it cheaper. But first in line? I don't think that it is suprising at all to hear they paid full price.

      --
      "If I am such a genius, how come that I am drunk and lost in the desert with a bullet in my ass?" --Otto (Malcom ITM)
    4. Re:Full Price? WHY?!? by norkakn · · Score: 2, Informative

      Add in 8gigs of ram and it isn't 2699.. all of their machines had either 4 or 8

    5. Re:Full Price? WHY?!? by david614 · · Score: 2, Interesting

      Well, I *do* live in Virginia - - and this is one of the greatest things to happen at a publically funded University in years! Great science, ingenuity, huge potential. Now *that* is why public funding is an essential part of R&D. D

      --
      ELITISM: It's always lonely at the top. Uninvited company is rarely welcome.
    6. Re:Full Price? WHY?!? by jcr · · Score: 5, Informative

      Yes, that would be shocking if it had happend, but it didn't. VPI paid the normal educational institution quantity pricing for 1100 units. They did NOT pay the single-unit price.

      Can we put this canard to rest now?

      -jcr

      --
      The only title of honor that a tyrant can grant is "Enemy of the State."
  9. interesting points by kaan · · Score: 5, Interesting

    I think it's interesting that he wasn't a Mac fan at all before this project. He says he chose it because it had better performance than everything else out there ("Ironically, they lost the gigahertz game," he said of Intel. "(The G5) is extremely faster than the Itanium II, hands down."), and was cheaper too (Dell and other manufacturers quoted prices between $10 and $12 million, vs. the $5.2 million or G5s).

    What more do you need? Faster systems, cheaper total cost, and slick looking cases.

    1. Re:interesting points by davidstrauss · · Score: 4, Insightful

      Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor (a.k.a. TRIPS) that IBM will hopefully be fabbing soon. Seeing a cost comparison with the Athlon64/Opteron would be more enlightening. Also consider that it would be almost impossible to buy Itanium or any other "enterprise" system without all the redundant hardware (ECC RAM, etc.) for which the G5 cluster compensates in software.

    2. Re:interesting points by RzUpAnmsCwrds · · Score: 4, Insightful

      "Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor"

      Your professor's opinion is... well... flawed.

      Itanium is an excellent architecture. Its flaws come from politics:

      1: Itanium requires good compilers. For now, that means compilers from Intel. GCC will be fine for running Mozilla on an Itanium, but technical apps simply won't perform anywhere near the performance of the machine when compied with GCC.

      2: Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.

      3: Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process. Before Madison (about a year ago), it was on a 180nm process.

      Some misconceptions:

      1: Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.

      2: Itanium is "slow". Wrong again, see above.

      3: Itanium doesn't scale. Wrong again. Itanium scales better than any other current architecture, getting nearly 100% of clock in both int and fp. Opteron gets around 99% int and 95% fp. Pentium 4 gets around 85% int and 80% fp. I don't have data for PPC970.

      4: Itanium is expensive. This is true, but it has to do with politics rather than architecture. Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture. Itanium takes much of the logic out of the CPU and puts it into the compiler (this is why you need good compilers). Itanium's architecture is called EPIC, or explicitly paralell instruction computing, because each instruction is "tagged" by the compiler to tell the CPU what instructions can and cannot be executed in paralell.

      EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else. Intel's politics prevents this from happening.

      So, please don't say that Itanium is a poor architecture. Itanium is a proven architecture. It uses fewer transistors and lower clock speeds than comparable RISC CPUs. Yes, it has problems, but most of them have to do with Itanium the CPU (too much cache, too expensive, not latest process) instead of EPIC the architecture.

    3. Re:interesting points by timeOday · · Score: 2, Insightful
      EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else.
      Doing more per clock isn't necessarily good if it pushes your clock speed too low. Itanium2 is only availble up to about 1.3 Ghz. As the article says, it's ironic that Intel should now lose the Mhz race.

      Using fewer transistors is good for reducing heat and manufacturing costs, but the Itanium is neither cheap nor cool (130W!). In the performance arena, Moore's law is useless unless chip designers figure out how to use MORE transistors to compute more quickly. Otherwise there's nothing to do with all those transistors except... more cache?

    4. Re:interesting points by stevesliva · · Score: 2, Interesting
      Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.
      I don't see your point here. More cache does not make it a better processor architecture.
      Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process.
      The PPC970 and Power4+ are both fabricated in 130nm technologies. Better silicon does not make it a better processor architecture.

      Speaking of cache, somewhat under-reported in the technical press was IBM's revelations of its upcoming Power5 server architecture. Yup, that's four dual-core processors each with 2MB of L2 cache, and four 36MB L3 cache chips all in the same package. IBM is leveraging it's packaging advantages against Intel's process advantages. Well, that, and making each processor die dual-core multithreaded.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    5. Re:interesting points by RzUpAnmsCwrds · · Score: 3, Interesting

      "Itanium2 is only availble up to about 1.3 Ghz."

      If by "about 1.3Ghz", you mean 1.5Ghz, then, yes, Itanium only goes up to 1.5Ghz. But at 1.5Ghz is faster than the fastest 3.2Ghz Pentium 4. With a decent process and less cache, it could easily scale to 2+ Ghz.

      " but the Itanium is neither cheap nor cool (130W!)"

      This has to do with the fact that the CPU has 3MB of cache on it. That makes the die huge which makes the CPU expensive. It also makes it heat up like a toaster. As a comparison, the latest Pentium 4s are ~90W, and they only have 512K of cache.

      "In the performance arena, Moore's law is useless unless chip designers figure out how to use MORE transistors to compute more quickly."

      My statement was that, for a given performance level, Itanium uses less transistors than RISC. Itanium was *designed* to use more transistors. That's why the instruction set is designed to produce code that runs well in paralell. RISC CPUs have to figure out what can be run in paralell in hardware - Itanium does it in the compiler.

    6. Re:interesting points by mczak · · Score: 4, Insightful
      Itanium is an excellent architecture.
      Can't agree there. It's certainly not as bad as the first Itanics made it look, it has lots of interesting ideas, but overall it seems the architecture didn't reach the goals intel probably had.
      Itanium requires good compilers. For now, that means compilers from Intel.
      Certainly. However, it looks like it is very, very hard (if not impossible) to write a good compiler for it - intel certainly invested a LOT of time and money, and increased performance quite a bit (quite a bit of the performance difference in published spec scores between itanium 1 and 2 is just because of a newer compiler), but if rumours are true the compiler still isn't quite that good - after what, 5 years?
      Lots of cache means lots of transistors means lots of heat.
      Not quite true. Cache transistors aren't very power hungry - look at P4 vs. P4EE with an additional 2MB L3 cache, the power consumption hardly changed (5W or so isn't much compared to the total of 90W).
      Intel is not fabbing Itanium with a state of the art process.
      Well, their 130nm process sounds quite good to me. Nobody really uses much better process technologies yet - AMD might have a slight edge with their 130nm SOI process, which should help a bit with power consumption.
      Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.
      The itanium makes up for its inefficiency with large caches (compared to P4 / Opteron). Compare the dell poweredge 3250 spec results with 1.4Ghz/1.5MB cache and 1.5Ghz/6MB cache, otherwise configured the same (unfortunately using slightly older compilers, so don't take the absolute values too seriously). The smaller cache (which is still more than Opteron/Pentium4 have) costs it (factored in the 6% clock speed disadvantage) about 20% in SpecInt (making it definitely slower than Opteron 146 and Pentium4, even considered the results would be higher with the newer compiler). In SpecFp it's about the same 20% difference, which means it still beats P4 and Opterons, but no longer by such impressive margins.
      Itanium doesn't scale. Wrong again.
      I'm too lazy to check the numbers, but the Itanium has a shared bus - granted, with quite a bit of bandwidth, but still shared (similar to the P4). 2 CPUs should scale well, 4 shouldn't be that bad, and after that you can forget it (meaning your 64 cpu boxes will be built with 4-node boards). The Opteron will scale much better beyond 4 nodes - its point-to-point communication is probably overkill for 2 nodes, should show some advantages with 4 nodes, and scale very well to 8 nodes - too bad nobody builds 8-node Opteron systems...
      Or do you mean scaling with clockspeed? In that case, the bigger the cache and the faster the system bus and ram, the better will it scale, but the cpu architecture itself is hardly a factor.
      Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture.
      Unfortunately I haven't seen any transistor numbers of a Itanium2 core. But I think it's not true. The Itanium saves some logic on instruction decoder, but has more execution units in parallel (which should lead to better performance, but ONLY IF it's actually possible to build a well optimizing compiler which manages to keep the execution units busy, and it's completely feasible that this is just not possible in the general case).
      EPIC scales better than RISC architectures.
      I really don't think this is true. Scaling is independant from the cpu core architecture.

      I will agree that EPIC (which, btw, isn't quite intels invention, it shares most of the ideas with VLIW) is a nice concept, but for some reason it just doesn't work in practice as well as it should.
  10. Re:Full price by Zelet · · Score: 2, Interesting

    They costed the G5 against Dell and IBM offerings and the Apple solution was cheaper. Where did you get your numbers? Why don't you go out and price out a Supercomputer for me will ya? Of course you know that it isn't feasible to BUILD 1100 units.

    --
    ...And when they came for me, there was no one left to speak out for me." - Martin Niemoeller (1892-1984)
  11. Dumb Question... by devphaeton · · Score: 4, Interesting

    ....maybe i'm obtuse, but i keep hearing about this thing as "..and we're only seeing X% of its real potential right now!"....

    1) Why can't they just shout "Let 'er rip!!" and crank the thing wide open?

    2) Why all the media buzz concerning this as a `surprise' when they've already got its performance figured out, apparently?

    Sorry.

    --


    do() || do_not(); // try();
    1. Re:Dumb Question... by SquareOfS · · Score: 5, Informative
      Because performance in a supercomputing cluster is not just the sum of the nodes.

      It's highly dependent on the interconnects, the topology of the network, the software that does the clustering (i.e., that actually makes the nodes available for parallelized word), etc.

      So minor tweaks can have major effects, and getting it tweaked properly is quite an accomplishment.

    2. Re:Dumb Question... by valkraider · · Score: 2, Funny

      the nodes available for parallelized word

      Does it make Word's performance acceptable?

    3. Re:Dumb Question... by Blimey85 · · Score: 2, Interesting
      They did have specs before hand. They said ok, we take this many and the max theoretical performance is X. We scale that back to Y percent and that's what we will likely achieve. We need to get to Z performance level and Y percent of X is above the Z threshold so we're good to go. Now lets talk price. It's the cheapest available and they can get it to us to meet our deadline? Great. Lets order.

      They new in advance what they could likely achieve with this cluster and they have surpassed what they were expecting. Now with some more tweaking they may take it a bit further. It's like a race car engine, you know the specs but once you get it and tune it you can often surpass the specs by a wide margin.

      --
      How is it that one careless match can start a forest fire, but it takes a whole box to start a campfire?
  12. Favorite Quote by Anonymous Coward · · Score: 5, Funny

    An audience member asked if he'd made the purchase through the Apple store. Varadarajan smiled and said that actually, yes, he had.

    1. Re:Favorite Quote by Glonoinha · · Score: 5, Funny

      I can see that one now ... Varadarajan surfs to www.apple.com/purchase

      Ok, max all the options. Cool.
      Now put 1100 in the quantity. Cool.
      Ok (chugga chugga chugga) $3.3 million dollars. Who has the credit card? (silence, *crickets*, the rude sound of nobody reaching for their wallet...)

      Ok maybe it is just me. Of course I have play configured a few systems in the online order systems of IBM and Dell a few times (didn't actually hit 'Submit' however) and it is possible to configure a single $100k machine from Dell. I haven't found the limit at IBM yet as they seem to have more imagination than I do (although it is easy just to get the SOFTWARE on one of their systems to exceed $100k.)

      --
      Glonoinha the MebiByte Slayer
  13. No, make that.. by SuperBanana · · Score: 4, Funny
    from the whole-lotta-clock-cycles dept.
    [snip]
    yes, it's running MacOSX Jaguar ( soon Panther)

    More like whole-lotta-CD-jockying. Perhaps the bio department can lend a hand by donating the services of their chimps to handle the CD swapping.

    (Yes, I'm aware there are smarter ways of doing it, but isn't it a fun mental picture, 100 chimps running around a cluster of G5's and throwing bananas and CDs at each other?) Talk about your fun install-fests.

  14. Re:technical details? like this one... by geoffspear · · Score: 5, Funny
    Tell you what... you build a cluster of 32 bit machines connected with 100 Base T ethernet and come back and tell us how many more nodes you needed and what it cost you when you have one of the 5 fastest computers in the world.

    Until then, quit your trolling.

    --
    Don't blame me; I'm never given mod points.
  15. Simply amazing by laird · · Score: 3, Insightful

    This is simply an amazing achievement. Plenty of people have built supercomputers from huge piles of x86's, but this team managed to not only pull the trick off in less time, for less money, but on a new hardware platform. I certainly follow their logic (PPC's have always been far better than x86's for real scientific-level precision FLOPs) but it's a really gutsy move betting your entire supercomputing program on a new CPU, new hardware platform, etc., and on your ability to get everything ported to the PPC -- that's a lot of risks to take, and a small school like that can't afford to fail, even building a relatively cheap supercomputer. But it clearly paid off! Not only did they get great PR for the university, they got a great computing resource for the students and faculty, and by doing it themselves rather than buying a complete system from a vendor, I am sure that those students all learned far more. And those 700 pizza and coke consuming students that cranked the code will all be able to say that they were part of this amazing thing.

    Damn!

    1. Re:Simply amazing by mslinux · · Score: 2, Insightful

      a small school like that can't afford to fail, even building a relatively cheap supercomputer.

      Dude, get your facts straight... it's the largest university in Virginia. 25,000 undergrads alone. I did my undergrad their... Phi Beta Kappa class of 2000.

    2. Re:Simply amazing by Monkey+Angst · · Score: 4, Funny
      I did my undergrad their... Phi Beta Kappa class of 2000.

      <reads sentences again carefully and whimpers for America>

      --
      stripShow - Where WordPress meets webcomics
  16. Jumbled numbers by SDMX · · Score: 3, Funny

    'yes, errors in RAM are accounted for,' And no malloc library benchmark jumbling bullshit this time? T minus 10 minutes before some PC nut looks at all this, sees that the Mac relies on something a PC can't do, and 'blows the whistle'. T minus 15 minus before they realize it's the OS.

  17. Too slow/expensive by burgburgburg · · Score: 3, Insightful
    If you'd read the article, you'd see that Varadarajan considered the Intel Itanium II but found "(The G5) is extremely faster than the Itanium II, hands down.". The AMD Opteron was too expensive. And for boxes, Dell and a bunch of other PC manufacturers quoted prices in the $10 million to $12 million dollar range.

    So he went full price with the G5 ($3000 apiece) and for only $5.2 million has the number 3 slot and is shooting for a 10% boost.

  18. Too bad some software patents will be filed by Colonel+Panic · · Score: 3, Interesting

    Varadarajan told the audience he would publish full documentation and release most of the code written for the machine. However, some of the software is subject to patent applications, he said, and he wasn't yet sure if it would be released under an open-source license.

    What's up with that?
    Used to be that work like this done at a Univeristy was considered 'open' as in available to anyone to help advance the state-of-the-art. Not anymore...

    1. Re:Too bad some software patents will be filed by norkakn · · Score: 5, Insightful

      It isn't their fault.. I hear a long story on NPR about it a while ago. Universities tried to stay out of the patent game, but companies would take their research and patent it and then charge the university to use it.. researchers having to pay to use their own findings.

      The patent system needs to be overhauled, then maybe we can start opening up the Universities again (and give them some more funding too!)

    2. Re:Too bad some software patents will be filed by Wesley+Felter · · Score: 2, Informative

      ...companies would take their research and patent it and then charge the university to use it.. researchers having to pay to use their own findings.

      Besides the prior art issue that others mentioned, academic research is not subject to patents. So university researchers never have to pay to license patents.

  19. Re:price by Anonymous Coward · · Score: 2, Insightful

    This is the type of statement that makes people question the accuracy of ANY price/performance claim made by a Mac fan(atic). Simply stating that "macs give more power and are a better value" is tremendously misleading. The obvious questions to ask are "When was this true?", "In what application?", "Against what competition?", and "By what objective standard?".

  20. Open source the code by BWJones · · Score: 2, Insightful

    So, the other really cool thing they are doing is open sourcing the code for error checking and connectivity.

    This is in addition to consulting where they are helping others build similar clusters.

    --
    Visit Jonesblog and say hello.
  21. Full price? by Aqua+OS+X · · Score: 4, Insightful

    Wow.. I can't believe Apple didn't cut them a break for buying 1100 Dual G5s.

    You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.

    But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.

    --
    "Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
    1. Re:Full price? by OECD · · Score: 4, Interesting

      You'd think apple would at least sell G5's to VT without SuperDrives

      OTOH, five years from now, when they have the world's 65,000th fastest supercomputer, they could just pull the thing apart and give/sell complete computers to their students. Then it's back to the Apple Store to order up a whole lot of G7's.

      --
      One man's -1 Flamebait is another man's +5 Funny.
  22. nerds by mooface · · Score: 2, Funny


    From the wired article:

    "After his presentation, a group of nerds followed him to the hotel's bar for drinks, hanging on his every word."

    How dorky did these guys have to be to have a reporter for "Wired" catagorize them as nerds...damn....

  23. Not a mac fan either... by Epistax · · Score: 5, Funny

    ... but that doesn't matter. An accomplishment is an accomplishment. Besides if an AI manifests itself it'd be less likely to destroy the world and more likely to tell you that your white socks do not match your purple tie.

  24. Executive Summary by cosmo7 · · Score: 5, Funny
    For your convenience I've collected the main arguments people have made against the cluster:
    • They got some special deal from Apple
    • It's running Linux, not OS X
    • Opterons would be faster and cheaper
    • The guy in charge is some Mac zealot
    • It isn't as fast as everyone expected
    • Rockets would not work in outer space as there is no atmosphere to push against
    1. Re:Executive Summary by sean23007 · · Score: 3, Informative

      Varadarajan revealed that in addition to the G5, he'd also considered using Advanced Micro Devices' Opteron and Intel's Itanium II processors. But the Opteron was too expensive and the Itanium too slow, he said.

      Maybe you didn't read close enough, because the articles specifically state that he didn't compare only to Intel and that he found the Opterons to be too expensive. I'm just saying, because I think a lot of people did see a quote in the article mentioning Opterons, and you seem to have missed it. Thought you'd like to know.

      And if you decide to disbelieve whatever you don't find convenient in a new story, you should rethink your statement about keeping religion and technical discussion separate, because you're really not.

      No offense, but I think it should be pointed out that not only Mac fans are zealous.

      --

      Lack of eloquence does not denote lack of intelligence, though they often coincide.
  25. A Little Perspective Here by EmCeeHawking · · Score: 5, Informative

    To those who are wondering why the G5 is a serious contender for supercomputing applications( and why VT decided the way they did ), you may want to follow this link: http://www.chaosmint.com/mac/vt-supercomputer/

    Here's a quick rundown:

    Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]

    Sun (sparc) - required too many processors, also too expensive

    IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available

    HP (itanium) - same

    Apple (IBM PPC970) - system available with chassis for lowest price

  26. Power PC 970 and G5 by mojowantshappy · · Score: 3, Interesting
    From the O'Reilly article:

    "The IBM with a PowerPC 970 was a first choice but the earliest delivery date would have been January 2004."

    "On June 23 Apple announced the G5."

    I was under the impression that the G5 was a Power PC 970. Is it just some derivative of the Power PC 970... or what?

    --

    This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!

    1. Re:Power PC 970 and G5 by OS24Ever · · Score: 2, Informative

      Yes, it is. But the first PowerPC 970 IBM Based systems will be out sometime in 1Q 04, the G5 is now.

      --

      As a rock-in-roll Physicist once said, No matter where you go, there you are.

  27. Just a thought on "home-brewed" means by insanecarbonbasedlif · · Score: 4, Informative

    From the summary: "the home-brewn supercluster is happily rolling around at 9.555 TFlops"

    Ignoring the "brewn" part of things, since when does "home-brewed" mean "designed and funded by a major university"?

    I usually think of "home brewed" as something that someone put together at home. With their own money. In their spare time.

    This is *not* a home-brew supercomputer, it is an institute designed and created super computer.

    That is all.

    --
    Just because I doubt myself does not mean I find your position compelling.
  28. Re:Why didn't they use Darwin or Gentoo? by Abcd1234 · · Score: 2, Informative

    Okay, first, I will guarantee you, the linpack they were running was properly optimized for the architecture. If not, they shouldn't be building a cluster in the first place, because they're morons.

    Second, the difference caused by increased optimization in the kernel, for an application like this, is relatively insignificant, simply because most of the work is done in user-space. In fact, any decent super-computing application will do its best to minimize system calls (allocating memory pools, chunking I/O, etc). About the only place the kernel is really involved is in sending/receiving data, and my bet is that optimization here would make relatively little difference, in light of the delays introduced by the network itself, interfacing with the card, etc, etc.

    Third, I highly doubt they're running any other software on the cluster nodes that would impact performance. Again, if they were doing that, they'd benefit more from hiring a new system architect.

    So, basically, what I'm saying is, comparing your little KDE desktop to a supercomputing application is laughable at best.

  29. More info on the G5 Cluster by mojowantshappy · · Score: 5, Informative
    Here is slideshow (in PDF format) with a bunch of details on the supercomputer, including desicions on what to get.. etc.

    Here is da slide-show

    --

    This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!

  30. Memory errors? by Hoser+McMoose · · Score: 2, Interesting

    I keep seeing reference to some sort of software that will defeat hardware memory errors.

    How, pray tell, are they planning on detecting these errors? I can understand how you could reduce the frequency of errors with only a slight loss in performance, ie take some sort of checksum of your data after every x number of cycles, but that doesn't eliminate the errors, only reduces their frequency. Maybe it reduces the frequency by enough that you don't need to worry about it, especially if 'x' is a sufficiently small number, but it still seems like a pretty risky prospect to me.

    Anyone seen any actual TECHNICAL details on this point, ie not just some Mac fan yelling "Deja Vu, DEJA VU!!!"?

    1. Re:Memory errors? by Anonymous Coward · · Score: 2, Informative

      How, pray tell, are they planning on detecting these errors?


      Anyone seen any actual TECHNICAL details on this point, ie not just some Mac fan yelling "Deja Vu, DEJA VU!!!"?


      The slashdot quote is grossly misleading. At the O'Reilley conference they were very clear the machine does NOT CORRECT MEMORY ERRORS, but they would like to use ECC in the future. The current tools are just intended for restarting failed/crashed jobs, not to detect silent memory errors.

    2. Re:Memory errors? by stacko · · Score: 2, Interesting

      I'm just guessing, but you'd probably implement the same ECC mechanism in software that ECC memory does in hardware.

      A quick google shows that ECC memory typically uses Hamming codes (or similar variations), which is pretty much what you'd expect. Skimming a few of the links, it would appear that most ECC memory is designed to correct a 1-bit error on a word. It is entirely possible that you can have the right combination of bit-errors that will slip past the ECC, regardless of whether it was implemented in hardware or software.

      It does seem a bit tedious to implement it in software, though. Each read and write to memory would have to be wrapped in the code that reads/detects or generates/writes the ECC bits to another location in memory.

      For the curious, you can learn more about Hamming codes here.

  31. Supercomputer article by Hoser+McMoose · · Score: 4, Informative

    For anyone interesting in learning a bit more about what some of the issues are when creating a super-computer, you might want to have a look at the following:

    Red Storm PDF

    The article is talking about Cray/Sandia's new Red Storm machine, a supercomputer using over 10,000 AMD Opteron processors that is expected to be competitive with the Earth Simulator for the #1 spot on the Top500 list. It does, however, talk about a lot more than just the specifics of this cluster, describing what some of the bottlenecks in supercomputers are and how to avoid/work around them.

  32. The Truth Revealed by merryworks4u · · Score: 2, Insightful

    Maybe IT management will read this and finally take note. TOC for backend management is cheaper on the Mac platform.

    --
    Michael Merry
    Merryworks
  33. system from IBM? by John+Harrison · · Score: 2, Insightful
    Um, in a roundabout way some of this is from IBM. The two CPUs in each box are from IBM.

    When IBM comes out with the $3,500 4-way 970 (G5 in Apple-speak) workstation it will be interesting to see what people do with it. Imagine a cluster that is 17% more expensive but with twice as many processors...

  34. Re:Anyone find the efficiency of this thing? by Hoser+McMoose · · Score: 5, Interesting

    The efficiency is quite poor for this machine, at least as far as efficiency is termed for supercomputers. The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor), but they are only turning in an actual score of 9.56TFlops/s, for an efficiency of only 54%. Even if they boost performance by 10%, they'll still only be ~60% efficient.

    For comparison, ASCI Q (#2 on Top500) reaches 68% efficiency, MCR Linux Cluster (currently #3, but to be pushed by by this new Mac cluster) reaches 69% efficiency, and the #1 spot, Earth Simulator, reaches a quite impressive 88% efficinecy.

    Of course, there are other ways to measure efficinecy. When it comes to performance/price, this Mac cluster does very well, even if you do take into account the real costs (ie MUCH more than just the $5.2 million up front cost). For cost/power consumption it seems reasonable, but not outstanding. 10TFlops/1.5MW of power is ok, and not too far off the Earth Simulator's 35TFlops/3.5MW of power, but it's certainly nothing to write home about. Cray's next big cluster, Red Storm, is likely to get over 30TFlops when it's released, but will consume only 2.0MW of power.

  35. Optimize Thit Optimize That by Uosdwis · · Score: 4, Insightful

    Okay for everyone asking about optimizations, why do it?

    Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.

    Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.

  36. The REAL power usage numbers by green+pizza · · Score: 5, Informative

    Just FWIW, they are claiming power usage of 1.5MW for this cluster of 2200 processors. Cray just released the numbers for their upcoming Red Storm cluster with over 10,000 AMD Opteron processors, just slightly less than 2.0MW.

    Ugh, this is getting old.

    Red Storm, the machine by itself itself, uses 2.0MW.

    Big Mac and all of its networking gear uses less than 0.75MW. The supercomputing center itself (building, air conditioning, UPS battery charging equipment, and the 1100 G5s) is fed by a 1.5MW substation feed. They're still not even maxing out the substation.

    The latest, fastest Opterons (not the scaled down low-power Opteron for blade servers) consume 53 watts at full clock. PowerPC 970 @ 2 GHz consumes 48 watts. The U2 and K3 motherboard chipset on the dual G5s uses just as much power as the PowerPC 970 "G5" processors. Hell, the power supply in a dual processor G5 system is 550 watts. 550 x 1100 machines = 0.61MW.

  37. Re:building supercomputer with desktops sucks by Ffakr · · Score: 2, Interesting

    I'm sure VT would have gone rack if possible, and I've hear a side benefit of the current setup is that, as new nodes become available they will be able to 'retire' the nodes to desktop duty for the staff around campus. A dual G5 should be able to run office pretty well, even in a few years. ;-)

    Also, I've heard that the system controller supports 16GB of ram but that Apple has only certified 1GB DIMMs so far. This would seem likely as a lot of Macs can accept more memory than initially advertised... only because larger memory modules became common (I put 1GB of ram in an old wallstreet G3 powerbook for someone and got it running even though it's officially rated at 512MB,.. I've got a sony from the same period here that absolutely won't take more than 256MB in to slots)

    --

    I'm not feeling witty so bite me

  38. Re:Anyone find the efficiency of this thing? by bnenning · · Score: 2, Insightful
    The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor)


    Yes and no. The only way the G5 can do 4 FP operations per cycle is if each of its 2 FP units executes a fused multiply-add instruction. Obviously no code is going to consist entirely of these, so the actual theoretical peak is less than the theoretical theoretical peak. Or something like that.

    --
    How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
  39. Re:Favorite Quote - Correction About Apple by jeholman2003 · · Score: 5, Informative

    I usually never reply to these things, but I think it is funny that people are arguing about how he ordered on the Apple Store. I find it even funnier that people would even go to the Apple Store and try. It was a joke! There were a lot of dedicated people at Apple, including myself, that helped to make this dream become a reality. The "myth" that I would like to clear up is that Apple DID have a clue and a lot of great people at Apple have been working really hard for that last few months, making a lot of personal sacrifices to make sure that all the awesome work from Dr. Varadarajan and the rest of the cluster team could be possible and successful. That's my 2 cents.


    Jerome Holman
    Apple Campus Representative @ VT
    http://filebox.vt.edu/users/jeholman
  40. Re:When They Switch It to Linux by ArcCoyote · · Score: 2, Informative

    Exactly. Under plain-vanilla PPC Linux that cluster would be literally smokin'. The G5's thermal management must be software controlled.

  41. Re:REAL men count in binary :) by BorgCopyeditor · · Score: 2, Funny
    2^1/2

    Now, that's a real number.

    --
    Shop as usual. And avoid panic buying.
  42. Re:Favorite Quote - Correction About Apple by mduell · · Score: 2, Interesting

    Ah, finally someone who is actually involved with the project. Can you tell me what the total cost of the super comptuer?
    The $5.2M figure seems to just be the Towers (Dual 2Ghz + 4GB RAM is $4814 with the standard educational discount, mulitply by 1100 and you get $5295400). What was the additional cost of the Infiniband cards and switches, the Cisco switches, the racks, and the cooling equipment? Were any modifications necessary for the building (more power, etc)?

  43. yes and no... (technical arguing) by green+pizza · · Score: 2, Insightful

    I am speaking from experience when I tell you that building a large cluster from desktops is just not a good way to go. They take up a hell of a lot more room, they put out a lot more heat, and the remote management capabilites are degraded.

    Desktops take up more room, correct. And yes, the desktop G5 does not have a console serial port like the xServe does. But seriously, how many modern clusters do you see with a terminal server connecting to each of the node's serial port? These days it's all install-and-run. OS X is UNIX... you can do a lot with a remote shell. These folks will never need to sit down at a GUI for each node. If you look at their setup photos, you'll see that they even removed the gfx card from each node.

    And... desktops DO NOT put out more heat that a similar rackmount unit. The hard drives are the same, the processors are the same. A larger case does not create more heat. More heat may be expelled due to better fans, but that is a GOOD THING, you don't want your board, ram, and processors to cook. The only difference between the two is the power supply. Slim rackmount machines generally have smaller power supplies. But, with modern switching power supplies, there is nearly no difference in power consumption (and, by the laws of thermodynamics, heat output).

    Once you go rack, you never go back. I much prefer a rack of 1U units that are built to be used in cluster situations.

    Yes and no. A rack of 1U servers is small, compact, snazzy looking, and neat. But, you also increase the number of processors per square foot, which can be a cooling issue. With a concentration of heat in that area, more cool air will need to be directed to the rack.

    I guess VT also has the luxury of running CPU intensive tasks. Those machines can only 8 GB RAM while other offerings can hold 16 GB and if they start to swap....ouch, not having SCSI drives will hurt.

    4 GB per processor is pretty good for the current HPC world. A lot of monster supercomputer are still sold with 2 - 4 GB per processor. The G5 can unoffically support 16 GB via 2 GB DIMMs, but Apple has not certified this. SCSI drives are great for a big RAID, fibrechannel is even better. But for the drive in each node, IDE is fine. Even Google uses IDE drives in their nodes (which they use as a distributed filesystem too!).

    All in all this setup is very impressive when just considering CPU performance. Wonder what is going to happen when a proffessor needs to run a few hundred jobs that use 10 or so GB of RAM each.

    The prof will have to re-write his code to use less ram per processor. This is a cluster afterall, and code for clusters have to work with a fixed amount of ram per node. This is not a Cray X1, SunFire15K, or SGI Origin with high thruput, low latency global shared memory. Very very few supercomputers, and even fewer clusters, have 10 GB of ram per processor. Even 8 GB per proc is pretty rare today.

    If the thread did need that much ram, it would be possible to pool memory between several nodes, it wouldn't be too fast, though (but still WAY faster than swapping to any harddrive). I believe they're currently getting a little over 800 MBytes/sec real-world thruput via the 20gbit full duplex Inifniband interconnects.

  44. Wrong by daveschroeder · · Score: 2, Insightful

    See http://www.netlib.org/benchmark/performance.pdf page 53.

    1. Earth simulator
    2. ASCI Q
    3. Virginia Tech G5 cluster (9.555 Tflops and rising, $5.2M HARDWARE ONLY)
    4. PNL Itanium2 cluster (8.633 Tflops, $24.5M HARDWARE ONLY)

    So nope, not only will the PNL Itanium2 cluster not be #2, it will also be 1Tflop behind the Virginia Tech cluster, and it will have done it at almost 5 times the cost. Bravo!

  45. Re:Anyone find the efficiency of this thing? by tap · · Score: 4, Insightful
    The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor), but they are only turning in an actual score of 9.56TFlops/s, for an efficiency of only 54%.
    The reason the efficiency is low isn't so much because the the 9.56 TFLOPS is a low number, but rather that the theoretical peak of 17.6 is unrealistically high. The only way you could get 17.6 is if you did nothing but paired multiply-add sequences entirely out of cache. No real code does this and so the 17.6 number is really nothing more than marketing bullshit. When an It2 or Xeon clsuter or NEC's Earth Simulator get better efficiencies it's because their made up "peak" numbers are more realistic than the one the marketing people used for the G5.

    You could calculate a new marketing BS peak number where multiply-add only counted as a single flop, or you took into account some realistic cache miss rate. The new lower theoretical peak would give you a much higher efficency.

  46. Re:a lot of people get the math wrong! by Halliday · · Score: 2, Informative

    You did precisely the mistake to which he was refering: The G5 FPUs can perform a "fused, multipl[y]-add operation per cycle, so you get 2 flops per cycle" per processor. Therefore:

    2 FPUs/ CPU * _2_ floating point operation per cycle per FPU = _4_ flop per CPU per cycle
    _4_ flop per CPU per cycle * 2 Gcycles per second = _8_ Gflops per CPU
    _8_ Gflop/s per CPU * 2 CPU per machine = _16_ Gflop/s per machine

  47. Re:a lot of people get the math wrong! by Greg+Titus · · Score: 2, Informative
    2 FPUs/ CPU * 1 floating point operation per cycle per FPU = 2 flop per CPU per cycle
    2 flops per CPU per cycle * 2 Gcycles per second = 4 Gflops per second per CPU
    4 Gflops/s per CPU * 2 CPU per machine = 8 Gflops/s per machine
    where does the extra 2x come from?


    A fused multiply-add is f0 = f1 * f2 + f3, which is two floating point operations in a single instruction. Each FPU on a G5 can execute an FMADD each cycle. So:

    1 FMADD per cycle = 2 flop/cycle * 2 FPUs = 4 flop/cycle * 2 CPUs = 8 flop/cycle * 2 GHz = 16 Gflop/s
  48. Answers by daveschroeder · · Score: 2, Interesting

    From http://macslash.org/article.pl?sid=03/10/28/235723 5&mode=thread "The total cost of the asset, including systems, memory, storage, primary and secondary communications fabrics and cables is $5.2mil. Facilities upgrade was $2mil. 1mil for the upgrades, 1mil for the UPS and generators." Total: $7.2M + essentially "volunteer" assembly So it's still a LOT cheaper than anything even close to comparable.

    1. Re:Answers by daveschroeder · · Score: 2, Insightful

      A dual 2GHz G5 costs $2699 at the academic discount. They probably added RAM from a 3rd party. "Cooling equipment" i would imagine was part of the $1M "facilities upgrade". From the article, again: "The total cost of the asset, including systems, memory, storage, primary and secondary communications fabrics and cables is $5.2mil. Facilities upgrade was $2mil. 1mil for the upgrades, 1mil for the UPS and generators." So out of that $1M, for facilities "upgrades", I'd say cooling/racks/etc was included in that. If you need it any more broken down, I'd imagine you'll have to contact VT.

  49. Re:if they paid full price, it's not a great deal by dbrutus · · Score: 2, Informative

    This was supposed to be a 64bit cluster so P4s were out. Itanium was too expensive and Opterons weren't out except as parts that would have to be assembled and that wasn't going to fly for their requirements. Can you imagine the risk of having AMD declare your assembly methods out of spec and refuse to replace any downed processors? This is a multi-million dollar cluster. They needed a chip and a chassis and they wanted it right then.