Factual 'Big Mac' Results

Re:Full price by McAddress · 2003-10-30 09:15 · Score: 2, Insightful

RTFA

The x86 cluster would have been twice as expensive. And this outpreforms the highest ranking x86 cluster, which has more processors.

Super computer? by ludky132 · 2003-10-30 09:16 · Score: 3, Insightful

I've always been sort of intrigued by Top500 Has there ever been a good comparison written about the similarities/differences between a 'supercomputer' and the regular pc sitting on my desk running Linux/2k? At what point does the computer in question earn the title "Super"?

Full Price? WHY?!? by JonTurner · 2003-10-30 09:16 · Score: 5, Insightful

>>yes, VT paid full price

This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
Shameful.

Re:Full Price? WHY?!? by sammy+baby · 2003-10-30 09:23 · Score: 3, Insightful

I agree. As an employee of a state-run university, I can attest that I'm elligible for a 10% discount off the purchase price of one of the dual 2GHZ G5s. (Originally $2999, discounted to $2699).

That VT wasn't able - or didn't think - to do the same is pretty shocking. A savings of $330,000 isn't anything to sneeze at.

Simply amazing by laird · 2003-10-30 09:19 · Score: 3, Insightful

This is simply an amazing achievement. Plenty of people have built supercomputers from huge piles of x86's, but this team managed to not only pull the trick off in less time, for less money, but on a new hardware platform. I certainly follow their logic (PPC's have always been far better than x86's for real scientific-level precision FLOPs) but it's a really gutsy move betting your entire supercomputing program on a new CPU, new hardware platform, etc., and on your ability to get everything ported to the PPC -- that's a lot of risks to take, and a small school like that can't afford to fail, even building a relatively cheap supercomputer. But it clearly paid off! Not only did they get great PR for the university, they got a great computing resource for the students and faculty, and by doing it themselves rather than buying a complete system from a vendor, I am sure that those students all learned far more. And those 700 pizza and coke consuming students that cranked the code will all be able to say that they were part of this amazing thing.

Damn!

--
Enable 3D printed prosthetics!

Re:Simply amazing by mslinux · 2003-10-30 11:27 · Score: 2, Insightful

a small school like that can't afford to fail, even building a relatively cheap supercomputer.

Dude, get your facts straight... it's the largest university in Virginia. 25,000 undergrads alone. I did my undergrad their... Phi Beta Kappa class of 2000.

Too slow/expensive by burgburgburg · 2003-10-30 09:22 · Score: 3, Insightful

If you'd read the article, you'd see that Varadarajan considered the Intel Itanium II but found "(The G5) is extremely faster than the Itanium II, hands down.". The AMD Opteron was too expensive. And for boxes, Dell and a bunch of other PC manufacturers quoted prices in the $10 million to $12 million dollar range.

So he went full price with the G5 ($3000 apiece) and for only $5.2 million has the number 3 slot and is shooting for a 10% boost.

Re:Too slow/expensive by Anonymous Coward · 2003-10-30 11:07 · Score: 1, Insightful

could have bought an AMD system WITHOUT any of the extra "goodies"

No, he couldn't have. That's kinda the point. There were no Opteron systems available at the time. They would have had to buy component parts straight from AMD and then contract somebody to assemble the machines for them, which would have cost considerably more than just buying them already designed, built, and tested from Apple.

I pity you.

Pull the other one.

Re:price by Anonymous Coward · 2003-10-30 09:22 · Score: 2, Insightful

This is the type of statement that makes people question the accuracy of ANY price/performance claim made by a Mac fan(atic). Simply stating that "macs give more power and are a better value" is tremendously misleading. The obvious questions to ask are "When was this true?", "In what application?", "Against what competition?", and "By what objective standard?".

Open source the code by BWJones · 2003-10-30 09:22 · Score: 2, Insightful

So, the other really cool thing they are doing is open sourcing the code for error checking and connectivity.

This is in addition to consulting where they are helping others build similar clusters.

--
Visit Jonesblog and say hello.

Re:Too bad some software patents will be filed by norkakn · 2003-10-30 09:27 · Score: 5, Insightful

It isn't their fault.. I hear a long story on NPR about it a while ago. Universities tried to stay out of the patent game, but companies would take their research and patent it and then charge the university to use it.. researchers having to pay to use their own findings.

The patent system needs to be overhauled, then maybe we can start opening up the Universities again (and give them some more funding too!)

Full price? by Aqua+OS+X · 2003-10-30 09:29 · Score: 4, Insightful

Wow.. I can't believe Apple didn't cut them a break for buying 1100 Dual G5s.

You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.

But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.

--
"Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"

Re:interesting points by davidstrauss · 2003-10-30 09:29 · Score: 4, Insightful

Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor (a.k.a. TRIPS) that IBM will hopefully be fabbing soon. Seeing a cost comparison with the Athlon64/Opteron would be more enlightening. Also consider that it would be almost impossible to buy Itanium or any other "enterprise" system without all the redundant hardware (ECC RAM, etc.) for which the G5 cluster compensates in software.

Re:Full price by Anonymous Coward · 2003-10-30 09:59 · Score: 1, Insightful

And this outpreforms the highest ranking x86 cluster, which has more processors.

The x86 cluster was built a year and a half ago.
OF COURSE this thing will be faster.

Why don't YOU RTFA with some perspective.

Re:Why didn't they use Darwin or Gentoo? by Anonymous Coward · 2003-10-30 10:01 · Score: 1, Insightful

How much faster than Aqua does KDE add and multiply?

The Truth Revealed by merryworks4u · 2003-10-30 10:15 · Score: 2, Insightful

Maybe IT management will read this and finally take note. TOC for backend management is cheaper on the Mac platform.

--
Michael Merry
Merryworks

system from IBM? by John+Harrison · 2003-10-30 10:23 · Score: 2, Insightful

Um, in a roundabout way some of this is from IBM. The two CPUs in each box are from IBM.

When IBM comes out with the $3,500 4-way 970 (G5 in Apple-speak) workstation it will be interesting to see what people do with it. Imagine a cluster that is 17% more expensive but with twice as many processors...

--
Lasers Controlled Games!

Optimize Thit Optimize That by Uosdwis · 2003-10-30 10:42 · Score: 4, Insightful

Okay for everyone asking about optimizations, why do it?

Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.

Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.

Re:interesting points by RzUpAnmsCwrds · 2003-10-30 11:19 · Score: 4, Insightful

"Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor"

Your professor's opinion is... well... flawed.

Itanium is an excellent architecture. Its flaws come from politics:

1: Itanium requires good compilers. For now, that means compilers from Intel. GCC will be fine for running Mozilla on an Itanium, but technical apps simply won't perform anywhere near the performance of the machine when compied with GCC.

2: Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.

3: Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process. Before Madison (about a year ago), it was on a 180nm process.

Some misconceptions:

1: Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.

2: Itanium is "slow". Wrong again, see above.

3: Itanium doesn't scale. Wrong again. Itanium scales better than any other current architecture, getting nearly 100% of clock in both int and fp. Opteron gets around 99% int and 95% fp. Pentium 4 gets around 85% int and 80% fp. I don't have data for PPC970.

4: Itanium is expensive. This is true, but it has to do with politics rather than architecture. Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture. Itanium takes much of the logic out of the CPU and puts it into the compiler (this is why you need good compilers). Itanium's architecture is called EPIC, or explicitly paralell instruction computing, because each instruction is "tagged" by the compiler to tell the CPU what instructions can and cannot be executed in paralell.

EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else. Intel's politics prevents this from happening.

So, please don't say that Itanium is a poor architecture. Itanium is a proven architecture. It uses fewer transistors and lower clock speeds than comparable RISC CPUs. Yes, it has problems, but most of them have to do with Itanium the CPU (too much cache, too expensive, not latest process) instead of EPIC the architecture.

Re:Anyone find the efficiency of this thing? by bnenning · 2003-10-30 11:19 · Score: 2, Insightful

The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor)

Yes and no. The only way the G5 can do 4 FP operations per cycle is if each of its 2 FP units executes a fused multiply-add instruction. Obviously no code is going to consist entirely of these, so the actual theoretical peak is less than the theoretical theoretical peak. Or something like that.

--
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.

Re:Memory errors? by Anonymous Coward · 2003-10-30 11:36 · Score: 1, Insightful

There is no way to completely prevent errors in any machine, the best you can do is to reduce the odds of an error occurring. It is possible to reduce the odds substantially, but there are no absolute guarantees.

One poster mentioned a redundancy scheme, doing a calculation multiple times in different memory locations. This does not prevent errors it only reduces the chances of an error occurring. Yes it does reduce them substantially, but it is still possible to get the same error in 2 different calculations.

Would still be cheaper by Anonymous Coward · 2003-10-30 11:52 · Score: 1, Insightful

Would still be faster and cheaper than even today's fastest Xeons. So STFU.

yes and no... (technical arguing) by green+pizza · 2003-10-30 11:54 · Score: 2, Insightful

I am speaking from experience when I tell you that building a large cluster from desktops is just not a good way to go. They take up a hell of a lot more room, they put out a lot more heat, and the remote management capabilites are degraded.

Desktops take up more room, correct. And yes, the desktop G5 does not have a console serial port like the xServe does. But seriously, how many modern clusters do you see with a terminal server connecting to each of the node's serial port? These days it's all install-and-run. OS X is UNIX... you can do a lot with a remote shell. These folks will never need to sit down at a GUI for each node. If you look at their setup photos, you'll see that they even removed the gfx card from each node.

And... desktops DO NOT put out more heat that a similar rackmount unit. The hard drives are the same, the processors are the same. A larger case does not create more heat. More heat may be expelled due to better fans, but that is a GOOD THING, you don't want your board, ram, and processors to cook. The only difference between the two is the power supply. Slim rackmount machines generally have smaller power supplies. But, with modern switching power supplies, there is nearly no difference in power consumption (and, by the laws of thermodynamics, heat output).

Once you go rack, you never go back. I much prefer a rack of 1U units that are built to be used in cluster situations.

Yes and no. A rack of 1U servers is small, compact, snazzy looking, and neat. But, you also increase the number of processors per square foot, which can be a cooling issue. With a concentration of heat in that area, more cool air will need to be directed to the rack.

I guess VT also has the luxury of running CPU intensive tasks. Those machines can only 8 GB RAM while other offerings can hold 16 GB and if they start to swap....ouch, not having SCSI drives will hurt.

4 GB per processor is pretty good for the current HPC world. A lot of monster supercomputer are still sold with 2 - 4 GB per processor. The G5 can unoffically support 16 GB via 2 GB DIMMs, but Apple has not certified this. SCSI drives are great for a big RAID, fibrechannel is even better. But for the drive in each node, IDE is fine. Even Google uses IDE drives in their nodes (which they use as a distributed filesystem too!).

All in all this setup is very impressive when just considering CPU performance. Wonder what is going to happen when a proffessor needs to run a few hundred jobs that use 10 or so GB of RAM each.

The prof will have to re-write his code to use less ram per processor. This is a cluster afterall, and code for clusters have to work with a fixed amount of ram per node. This is not a Cray X1, SunFire15K, or SGI Origin with high thruput, low latency global shared memory. Very very few supercomputers, and even fewer clusters, have 10 GB of ram per processor. Even 8 GB per proc is pretty rare today.

If the thread did need that much ram, it would be possible to pool memory between several nodes, it wouldn't be too fast, though (but still WAY faster than swapping to any harddrive). I believe they're currently getting a little over 800 MBytes/sec real-world thruput via the 20gbit full duplex Inifniband interconnects.

Yeah one of my professor was pointing it out by Anonymous Coward · 2003-10-30 11:55 · Score: 1, Insightful

My prof at the U of Michigan was pointing out the architectual oddity of every product intel makes. The itanium first off for all its VLIWness fetches fewer instructions per fetch than the G5 6 vs. 8 for the G5, it cannot do out of order exacution, because it has to process the 128 bits of intstructions they cannot ramp up the clock. About the only very interesting thing is how it has so many registers that add very little to its overhead and essentially allow it to avoid costly memory access for putting variables on the stack. Over all itanium2 is not that bad just a little too compilicated and over engineered, I do not argue with their decision to make a processor that does not do OOE or has such large caches and such a big register file but it is the VLIWness that gets them, it adds to much complexity and too many gate delays. But yeah basically my comp arch professor was saying that benchmarks aside the G5 was arguably the fastest desktop processor out there. Also the apple PI architecture does not hurt either, the powermac G5 has a very well designed system architecture too it can keep those processors pinned. Apple and IBM developed one hell of a system considering its cost.

Wrong by daveschroeder · 2003-10-30 12:00 · Score: 2, Insightful

See http://www.netlib.org/benchmark/performance.pdf page 53.

1. Earth simulator
2. ASCI Q
3. Virginia Tech G5 cluster (9.555 Tflops and rising, $5.2M HARDWARE ONLY)
4. PNL Itanium2 cluster (8.633 Tflops, $24.5M HARDWARE ONLY)

So nope, not only will the PNL Itanium2 cluster not be #2, it will also be 1Tflop behind the Virginia Tech cluster, and it will have done it at almost 5 times the cost. Bravo!

Re:Wrong by Anonymous Coward · 2003-10-30 19:36 · Score: 1, Insightful

Check IBM's performance figures for the PPC970. They're higher than Apple's figures.

CPU manufacturers estimate the performance to some extent, and where it's not estimated it depends on hardware that is simply not going to fly in the mass market.

So you end up with pie-in-the-sky numbers that really don't mean anything except when being compared to other pie-in-the-sky numbers.

When someone compares those fanciful numbers against a system builder's numbers, guess which one is going to be lower?

Gotta love platform religion backed up by absurd figures, only to conveniently "forget" to use IBM's absurd figures in the comparison.

Re:Anyone find the efficiency of this thing? by tap · 2003-10-30 12:07 · Score: 4, Insightful

The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor), but they are only turning in an actual score of 9.56TFlops/s, for an efficiency of only 54%.

The reason the efficiency is low isn't so much because the the 9.56 TFLOPS is a low number, but rather that the theoretical peak of 17.6 is unrealistically high. The only way you could get 17.6 is if you did nothing but paired multiply-add sequences entirely out of cache. No real code does this and so the 17.6 number is really nothing more than marketing bullshit. When an It2 or Xeon clsuter or NEC's Earth Simulator get better efficiencies it's because their made up "peak" numbers are more realistic than the one the marketing people used for the G5.

You could calculate a new marketing BS peak number where multiply-add only counted as a single flop, or you took into account some realistic cache miss rate. The new lower theoretical peak would give you a much higher efficency.

Re:interesting points by neurosis101 · 2003-10-30 12:20 · Score: 1, Insightful

In real world applications the itanium won't scale as well. It's based on how the parallelism works, which is in the compiler. The compiler bundles instructions together based on how the can be run in parallel. These bundles can only be composed in a certain ways; if I recall its 2 int ops, 2 fp ops, and 1 branch op. Regardless, I know its 5 instructions, one of which HAS to be a branch per bundle; this is obviously hugely inefficient because 20% of your code base will not be branches.

Re:interesting points by timeOday · 2003-10-30 12:23 · Score: 2, Insightful

EPIC scales better than RISC architectures. It does more work with a lower clock and fewer transistors. That means that it will ultimately result in a cooler, cheaper, smaller, faster CPU than anything else.

Doing more per clock isn't necessarily good if it pushes your clock speed too low. Itanium2 is only availble up to about 1.3 Ghz. As the article says, it's ironic that Intel should now lose the Mhz race.

Using fewer transistors is good for reducing heat and manufacturing costs, but the Itanium is neither cheap nor cool (130W!). In the performance arena, Moore's law is useless unless chip designers figure out how to use MORE transistors to compute more quickly. Otherwise there's nothing to do with all those transistors except... more cache?

Re:interesting points by mczak · 2003-10-30 13:43 · Score: 4, Insightful

Itanium is an excellent architecture.

Can't agree there. It's certainly not as bad as the first Itanics made it look, it has lots of interesting ideas, but overall it seems the architecture didn't reach the goals intel probably had.

Itanium requires good compilers. For now, that means compilers from Intel.

Certainly. However, it looks like it is very, very hard (if not impossible) to write a good compiler for it - intel certainly invested a LOT of time and money, and increased performance quite a bit (quite a bit of the performance difference in published spec scores between itanium 1 and 2 is just because of a newer compiler), but if rumours are true the compiler still isn't quite that good - after what, 5 years?

Lots of cache means lots of transistors means lots of heat.

Not quite true. Cache transistors aren't very power hungry - look at P4 vs. P4EE with an additional 2MB L3 cache, the power consumption hardly changed (5W or so isn't much compared to the total of 90W).

Intel is not fabbing Itanium with a state of the art process.

Well, their 130nm process sounds quite good to me. Nobody really uses much better process technologies yet - AMD might have a slight edge with their 130nm SOI process, which should help a bit with power consumption.

Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.

The itanium makes up for its inefficiency with large caches (compared to P4 / Opteron). Compare the dell poweredge 3250 spec results with 1.4Ghz/1.5MB cache and 1.5Ghz/6MB cache, otherwise configured the same (unfortunately using slightly older compilers, so don't take the absolute values too seriously). The smaller cache (which is still more than Opteron/Pentium4 have) costs it (factored in the 6% clock speed disadvantage) about 20% in SpecInt (making it definitely slower than Opteron 146 and Pentium4, even considered the results would be higher with the newer compiler). In SpecFp it's about the same 20% difference, which means it still beats P4 and Opterons, but no longer by such impressive margins.

Itanium doesn't scale. Wrong again.

I'm too lazy to check the numbers, but the Itanium has a shared bus - granted, with quite a bit of bandwidth, but still shared (similar to the P4). 2 CPUs should scale well, 4 shouldn't be that bad, and after that you can forget it (meaning your 64 cpu boxes will be built with 4-node boards). The Opteron will scale much better beyond 4 nodes - its point-to-point communication is probably overkill for 2 nodes, should show some advantages with 4 nodes, and scale very well to 8 nodes - too bad nobody builds 8-node Opteron systems...
Or do you mean scaling with clockspeed? In that case, the bigger the cache and the faster the system bus and ram, the better will it scale, but the cpu architecture itself is hardly a factor.

Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture.

Unfortunately I haven't seen any transistor numbers of a Itanium2 core. But I think it's not true. The Itanium saves some logic on instruction decoder, but has more execution units in parallel (which should lead to better performance, but ONLY IF it's actually possible to build a well optimizing compiler which manages to keep the execution units busy, and it's completely feasible that this is just not possible in the general case).

EPIC scales better than RISC architectures.

I really don't think this is true. Scaling is independant from the cpu core architecture.

I will agree that EPIC (which, btw, isn't quite intels invention, it shares most of the ideas with VLIW) is a nice concept, but for some reason it just doesn't work in practice as well as it should.

Re:Answers by daveschroeder · 2003-10-31 12:03 · Score: 2, Insightful

A dual 2GHz G5 costs $2699 at the academic discount. They probably added RAM from a 3rd party. "Cooling equipment" i would imagine was part of the $1M "facilities upgrade". From the article, again: "The total cost of the asset, including systems, memory, storage, primary and secondary communications fabrics and cables is $5.2mil. Facilities upgrade was $2mil. 1mil for the upgrades, 1mil for the UPS and generators." So out of that $1M, for facilities "upgrades", I'd say cooling/racks/etc was included in that. If you need it any more broken down, I'd imagine you'll have to contact VT.

31 of 566 comments (clear)