Slashdot Mirror


Factual 'Big Mac' Results

danigiri writes "Finally Varadarajan has put some hard facts on the speed of the VT 'Big Mac' G5 cluster. Undoubtedly after some weeks of tuning and optimization, the home-brewn supercluster is happily rolling around at 9.555 TFlops in LINPACK. The revelations were made by the parallel computing voodoo master himself at the O'Reilly Mac OS X conference. It seems they are expecting and additional 10% speed boost after some more tweaking. Srinidhi received standing ovations from the audience. Wired news is also running a cool news piece on it. Lots of juicy technical and cost details not revealed before. Myth dispelling redux: yes, VT paid full price, yes, it's running Mac OS X Jaguar (soon Panther), yes, errors in RAM are accounted for, Varadarajan was not an Apple fanboy in the least... read the articles for more booze."

8 of 566 comments (clear)

  1. Full Price? WHY?!? by JonTurner · · Score: 5, Insightful

    >>yes, VT paid full price

    This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
    Shameful.

  2. Re:Too bad some software patents will be filed by norkakn · · Score: 5, Insightful

    It isn't their fault.. I hear a long story on NPR about it a while ago. Universities tried to stay out of the patent game, but companies would take their research and patent it and then charge the university to use it.. researchers having to pay to use their own findings.

    The patent system needs to be overhauled, then maybe we can start opening up the Universities again (and give them some more funding too!)

  3. Full price? by Aqua+OS+X · · Score: 4, Insightful

    Wow.. I can't believe Apple didn't cut them a break for buying 1100 Dual G5s.

    You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.

    But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.

    --
    "Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
  4. Re:interesting points by davidstrauss · · Score: 4, Insightful

    Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor (a.k.a. TRIPS) that IBM will hopefully be fabbing soon. Seeing a cost comparison with the Athlon64/Opteron would be more enlightening. Also consider that it would be almost impossible to buy Itanium or any other "enterprise" system without all the redundant hardware (ECC RAM, etc.) for which the G5 cluster compensates in software.

  5. Optimize Thit Optimize That by Uosdwis · · Score: 4, Insightful

    Okay for everyone asking about optimizations, why do it?

    Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.

    Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.

  6. Re:interesting points by RzUpAnmsCwrds · · Score: 4, Insightful

    "Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor"

    Your professor's opinion is... well... flawed.

    Itanium is an excellent architecture. Its flaws come from politics:

    1: Itanium requires good compilers. For now, that means compilers from Intel. GCC will be fine for running Mozilla on an Itanium, but technical apps simply won't perform anywhere near the performance of the machine when compied with GCC.

    2: Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.

    3: Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process. Before Madison (about a year ago), it was on a 180nm process.

    Some misconceptions:

    1: Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.

    2: Itanium is "slow". Wrong again, see above.

    3: Itanium doesn't scale. Wrong again. Itanium scales better than any other current architecture, getting nearly 100% of clock in both int and fp. Opteron gets around 99% int and 95% fp. Pentium 4 gets around 85% int and 80% fp. I don't have data for PPC970.

    4: Itanium is expensive. This is true, but it has to do with politics rather than architecture. Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture. Itanium takes much of the logic out of the CPU and puts it into the compiler (this is why you need good compilers). Itanium's architecture is called EPIC, or explicitly paralell instruction computing, because each instruction is "tagged" by the compiler to tell the CPU what instructions can and cannot be executed in paralell.

    EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else. Intel's politics prevents this from happening.

    So, please don't say that Itanium is a poor architecture. Itanium is a proven architecture. It uses fewer transistors and lower clock speeds than comparable RISC CPUs. Yes, it has problems, but most of them have to do with Itanium the CPU (too much cache, too expensive, not latest process) instead of EPIC the architecture.

  7. Re:Anyone find the efficiency of this thing? by tap · · Score: 4, Insightful
    The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor), but they are only turning in an actual score of 9.56TFlops/s, for an efficiency of only 54%.
    The reason the efficiency is low isn't so much because the the 9.56 TFLOPS is a low number, but rather that the theoretical peak of 17.6 is unrealistically high. The only way you could get 17.6 is if you did nothing but paired multiply-add sequences entirely out of cache. No real code does this and so the 17.6 number is really nothing more than marketing bullshit. When an It2 or Xeon clsuter or NEC's Earth Simulator get better efficiencies it's because their made up "peak" numbers are more realistic than the one the marketing people used for the G5.

    You could calculate a new marketing BS peak number where multiply-add only counted as a single flop, or you took into account some realistic cache miss rate. The new lower theoretical peak would give you a much higher efficency.

  8. Re:interesting points by mczak · · Score: 4, Insightful
    Itanium is an excellent architecture.
    Can't agree there. It's certainly not as bad as the first Itanics made it look, it has lots of interesting ideas, but overall it seems the architecture didn't reach the goals intel probably had.
    Itanium requires good compilers. For now, that means compilers from Intel.
    Certainly. However, it looks like it is very, very hard (if not impossible) to write a good compiler for it - intel certainly invested a LOT of time and money, and increased performance quite a bit (quite a bit of the performance difference in published spec scores between itanium 1 and 2 is just because of a newer compiler), but if rumours are true the compiler still isn't quite that good - after what, 5 years?
    Lots of cache means lots of transistors means lots of heat.
    Not quite true. Cache transistors aren't very power hungry - look at P4 vs. P4EE with an additional 2MB L3 cache, the power consumption hardly changed (5W or so isn't much compared to the total of 90W).
    Intel is not fabbing Itanium with a state of the art process.
    Well, their 130nm process sounds quite good to me. Nobody really uses much better process technologies yet - AMD might have a slight edge with their 130nm SOI process, which should help a bit with power consumption.
    Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.
    The itanium makes up for its inefficiency with large caches (compared to P4 / Opteron). Compare the dell poweredge 3250 spec results with 1.4Ghz/1.5MB cache and 1.5Ghz/6MB cache, otherwise configured the same (unfortunately using slightly older compilers, so don't take the absolute values too seriously). The smaller cache (which is still more than Opteron/Pentium4 have) costs it (factored in the 6% clock speed disadvantage) about 20% in SpecInt (making it definitely slower than Opteron 146 and Pentium4, even considered the results would be higher with the newer compiler). In SpecFp it's about the same 20% difference, which means it still beats P4 and Opterons, but no longer by such impressive margins.
    Itanium doesn't scale. Wrong again.
    I'm too lazy to check the numbers, but the Itanium has a shared bus - granted, with quite a bit of bandwidth, but still shared (similar to the P4). 2 CPUs should scale well, 4 shouldn't be that bad, and after that you can forget it (meaning your 64 cpu boxes will be built with 4-node boards). The Opteron will scale much better beyond 4 nodes - its point-to-point communication is probably overkill for 2 nodes, should show some advantages with 4 nodes, and scale very well to 8 nodes - too bad nobody builds 8-node Opteron systems...
    Or do you mean scaling with clockspeed? In that case, the bigger the cache and the faster the system bus and ram, the better will it scale, but the cpu architecture itself is hardly a factor.
    Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture.
    Unfortunately I haven't seen any transistor numbers of a Itanium2 core. But I think it's not true. The Itanium saves some logic on instruction decoder, but has more execution units in parallel (which should lead to better performance, but ONLY IF it's actually possible to build a well optimizing compiler which manages to keep the execution units busy, and it's completely feasible that this is just not possible in the general case).
    EPIC scales better than RISC architectures.
    I really don't think this is true. Scaling is independant from the cpu core architecture.

    I will agree that EPIC (which, btw, isn't quite intels invention, it shares most of the ideas with VLIW) is a nice concept, but for some reason it just doesn't work in practice as well as it should.