Factual 'Big Mac' Results
danigiri writes "Finally Varadarajan has put some hard facts on the speed of the VT 'Big Mac' G5 cluster. Undoubtedly after some weeks of tuning and optimization, the home-brewn supercluster is happily rolling around at 9.555 TFlops in LINPACK.
The revelations were made by the parallel computing voodoo master himself at the O'Reilly Mac OS X conference. It seems they are expecting and additional 10% speed boost after some more tweaking. Srinidhi received standing ovations from the audience.
Wired news is also running a cool news piece on it. Lots of juicy technical and cost details not revealed before. Myth dispelling redux: yes, VT paid full price, yes, it's running Mac OS X Jaguar (soon Panther), yes, errors in RAM are accounted for, Varadarajan was not an Apple fanboy in the least... read the articles for more booze."
Big Macs are bad for your health.
....ok, we've really got real numbers THIS time!!
do() || do_not();
I haven't seen a cluster of Macs this big and powerful since the last annual pimp convetion!
Now, where did all the tricks go?
Until Slashdot fixes the funny modifier, use insightful or interesting. The poster knows your intentions.
Is that a word? How about brewed? Hate to nit, but .... aw... nevermind.
The x86 cluster would have been twice as expensive. And this outpreforms the highest ranking x86 cluster, which has more processors.
I've always been sort of intrigued by Top500 Has there ever been a good comparison written about the similarities/differences between a 'supercomputer' and the regular pc sitting on my desk running Linux/2k? At what point does the computer in question earn the title "Super"?
The power usage (think cooling the room) for a similarly-performing Athlon cluster would likely more than make up for what phantom price difference you are talking about.
MORTAR COMBAT!
>>yes, VT paid full price
This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
Shameful.
I think it's interesting that he wasn't a Mac fan at all before this project. He says he chose it because it had better performance than everything else out there ("Ironically, they lost the gigahertz game," he said of Intel. "(The G5) is extremely faster than the Itanium II, hands down."), and was cheaper too (Dell and other manufacturers quoted prices between $10 and $12 million, vs. the $5.2 million or G5s).
What more do you need? Faster systems, cheaper total cost, and slick looking cases.
They costed the G5 against Dell and IBM offerings and the Apple solution was cheaper. Where did you get your numbers? Why don't you go out and price out a Supercomputer for me will ya? Of course you know that it isn't feasible to BUILD 1100 units.
...And when they came for me, there was no one left to speak out for me." - Martin Niemoeller (1892-1984)
....maybe i'm obtuse, but i keep hearing about this thing as "..and we're only seeing X% of its real potential right now!"....
1) Why can't they just shout "Let 'er rip!!" and crank the thing wide open?
2) Why all the media buzz concerning this as a `surprise' when they've already got its performance figured out, apparently?
Sorry.
do() || do_not();
An audience member asked if he'd made the purchase through the Apple store. Varadarajan smiled and said that actually, yes, he had.
[snip]
yes, it's running MacOSX Jaguar ( soon Panther)
More like whole-lotta-CD-jockying. Perhaps the bio department can lend a hand by donating the services of their chimps to handle the CD swapping.
(Yes, I'm aware there are smarter ways of doing it, but isn't it a fun mental picture, 100 chimps running around a cluster of G5's and throwing bananas and CDs at each other?) Talk about your fun install-fests.
Please help metamoderate.
Until then, quit your trolling.
Don't blame me; I'm never given mod points.
This is simply an amazing achievement. Plenty of people have built supercomputers from huge piles of x86's, but this team managed to not only pull the trick off in less time, for less money, but on a new hardware platform. I certainly follow their logic (PPC's have always been far better than x86's for real scientific-level precision FLOPs) but it's a really gutsy move betting your entire supercomputing program on a new CPU, new hardware platform, etc., and on your ability to get everything ported to the PPC -- that's a lot of risks to take, and a small school like that can't afford to fail, even building a relatively cheap supercomputer. But it clearly paid off! Not only did they get great PR for the university, they got a great computing resource for the students and faculty, and by doing it themselves rather than buying a complete system from a vendor, I am sure that those students all learned far more. And those 700 pizza and coke consuming students that cranked the code will all be able to say that they were part of this amazing thing.
Damn!
Enable 3D printed prosthetics!
'yes, errors in RAM are accounted for,' And no malloc library benchmark jumbling bullshit this time? T minus 10 minutes before some PC nut looks at all this, sees that the Mac relies on something a PC can't do, and 'blows the whistle'. T minus 15 minus before they realize it's the OS.
So he went full price with the G5 ($3000 apiece) and for only $5.2 million has the number 3 slot and is shooting for a 10% boost.
Varadarajan told the audience he would publish full documentation and release most of the code written for the machine. However, some of the software is subject to patent applications, he said, and he wasn't yet sure if it would be released under an open-source license.
What's up with that?
Used to be that work like this done at a Univeristy was considered 'open' as in available to anyone to help advance the state-of-the-art. Not anymore...
This is the type of statement that makes people question the accuracy of ANY price/performance claim made by a Mac fan(atic). Simply stating that "macs give more power and are a better value" is tremendously misleading. The obvious questions to ask are "When was this true?", "In what application?", "Against what competition?", and "By what objective standard?".
So, the other really cool thing they are doing is open sourcing the code for error checking and connectivity.
This is in addition to consulting where they are helping others build similar clusters.
Visit Jonesblog and say hello.
Wow.. I can't believe Apple didn't cut them a break for buying 1100 Dual G5s.
You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.
But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.
"Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
From the wired article:
"After his presentation, a group of nerds followed him to the hotel's bar for drinks, hanging on his every word."
How dorky did these guys have to be to have a reporter for "Wired" catagorize them as nerds...damn....
... but that doesn't matter. An accomplishment is an accomplishment. Besides if an AI manifests itself it'd be less likely to destroy the world and more likely to tell you that your white socks do not match your purple tie.
To those who are wondering why the G5 is a serious contender for supercomputing applications( and why VT decided the way they did ), you may want to follow this link: http://www.chaosmint.com/mac/vt-supercomputer/
Here's a quick rundown:
Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]
Sun (sparc) - required too many processors, also too expensive
IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available
HP (itanium) - same
Apple (IBM PPC970) - system available with chassis for lowest price
"The IBM with a PowerPC 970 was a first choice but the earliest delivery date would have been January 2004."
"On June 23 Apple announced the G5."
I was under the impression that the G5 was a Power PC 970. Is it just some derivative of the Power PC 970... or what?
This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!
From the summary: "the home-brewn supercluster is happily rolling around at 9.555 TFlops"
Ignoring the "brewn" part of things, since when does "home-brewed" mean "designed and funded by a major university"?
I usually think of "home brewed" as something that someone put together at home. With their own money. In their spare time.
This is *not* a home-brew supercomputer, it is an institute designed and created super computer.
That is all.
Just because I doubt myself does not mean I find your position compelling.
Okay, first, I will guarantee you, the linpack they were running was properly optimized for the architecture. If not, they shouldn't be building a cluster in the first place, because they're morons.
Second, the difference caused by increased optimization in the kernel, for an application like this, is relatively insignificant, simply because most of the work is done in user-space. In fact, any decent super-computing application will do its best to minimize system calls (allocating memory pools, chunking I/O, etc). About the only place the kernel is really involved is in sending/receiving data, and my bet is that optimization here would make relatively little difference, in light of the delays introduced by the network itself, interfacing with the card, etc, etc.
Third, I highly doubt they're running any other software on the cluster nodes that would impact performance. Again, if they were doing that, they'd benefit more from hiring a new system architect.
So, basically, what I'm saying is, comparing your little KDE desktop to a supercomputing application is laughable at best.
Here is da slide-show
This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!
I keep seeing reference to some sort of software that will defeat hardware memory errors.
How, pray tell, are they planning on detecting these errors? I can understand how you could reduce the frequency of errors with only a slight loss in performance, ie take some sort of checksum of your data after every x number of cycles, but that doesn't eliminate the errors, only reduces their frequency. Maybe it reduces the frequency by enough that you don't need to worry about it, especially if 'x' is a sufficiently small number, but it still seems like a pretty risky prospect to me.
Anyone seen any actual TECHNICAL details on this point, ie not just some Mac fan yelling "Deja Vu, DEJA VU!!!"?
For anyone interesting in learning a bit more about what some of the issues are when creating a super-computer, you might want to have a look at the following:
Red Storm PDF
The article is talking about Cray/Sandia's new Red Storm machine, a supercomputer using over 10,000 AMD Opteron processors that is expected to be competitive with the Earth Simulator for the #1 spot on the Top500 list. It does, however, talk about a lot more than just the specifics of this cluster, describing what some of the bottlenecks in supercomputers are and how to avoid/work around them.
Maybe IT management will read this and finally take note. TOC for backend management is cheaper on the Mac platform.
Michael Merry
Merryworks
When IBM comes out with the $3,500 4-way 970 (G5 in Apple-speak) workstation it will be interesting to see what people do with it. Imagine a cluster that is 17% more expensive but with twice as many processors...
Lasers Controlled Games!
The efficiency is quite poor for this machine, at least as far as efficiency is termed for supercomputers. The cluster has a theoretical peak of 17.6TFlops/s if I did my math right (8GFlops/s per processor), but they are only turning in an actual score of 9.56TFlops/s, for an efficiency of only 54%. Even if they boost performance by 10%, they'll still only be ~60% efficient.
For comparison, ASCI Q (#2 on Top500) reaches 68% efficiency, MCR Linux Cluster (currently #3, but to be pushed by by this new Mac cluster) reaches 69% efficiency, and the #1 spot, Earth Simulator, reaches a quite impressive 88% efficinecy.
Of course, there are other ways to measure efficinecy. When it comes to performance/price, this Mac cluster does very well, even if you do take into account the real costs (ie MUCH more than just the $5.2 million up front cost). For cost/power consumption it seems reasonable, but not outstanding. 10TFlops/1.5MW of power is ok, and not too far off the Earth Simulator's 35TFlops/3.5MW of power, but it's certainly nothing to write home about. Cray's next big cluster, Red Storm, is likely to get over 30TFlops when it's released, but will consume only 2.0MW of power.
Okay for everyone asking about optimizations, why do it?
Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.
Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.
Just FWIW, they are claiming power usage of 1.5MW for this cluster of 2200 processors. Cray just released the numbers for their upcoming Red Storm cluster with over 10,000 AMD Opteron processors, just slightly less than 2.0MW.
Ugh, this is getting old.
Red Storm, the machine by itself itself, uses 2.0MW.
Big Mac and all of its networking gear uses less than 0.75MW. The supercomputing center itself (building, air conditioning, UPS battery charging equipment, and the 1100 G5s) is fed by a 1.5MW substation feed. They're still not even maxing out the substation.
The latest, fastest Opterons (not the scaled down low-power Opteron for blade servers) consume 53 watts at full clock. PowerPC 970 @ 2 GHz consumes 48 watts. The U2 and K3 motherboard chipset on the dual G5s uses just as much power as the PowerPC 970 "G5" processors. Hell, the power supply in a dual processor G5 system is 550 watts. 550 x 1100 machines = 0.61MW.
I'm sure VT would have gone rack if possible, and I've hear a side benefit of the current setup is that, as new nodes become available they will be able to 'retire' the nodes to desktop duty for the staff around campus. A dual G5 should be able to run office pretty well, even in a few years. ;-)
Also, I've heard that the system controller supports 16GB of ram but that Apple has only certified 1GB DIMMs so far. This would seem likely as a lot of Macs can accept more memory than initially advertised... only because larger memory modules became common (I put 1GB of ram in an old wallstreet G3 powerbook for someone and got it running even though it's officially rated at 512MB,.. I've got a sony from the same period here that absolutely won't take more than 256MB in to slots)
I'm not feeling witty so bite me
Yes and no. The only way the G5 can do 4 FP operations per cycle is if each of its 2 FP units executes a fused multiply-add instruction. Obviously no code is going to consist entirely of these, so the actual theoretical peak is less than the theoretical theoretical peak. Or something like that.
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
I usually never reply to these things, but I think it is funny that people are arguing about how he ordered on the Apple Store. I find it even funnier that people would even go to the Apple Store and try. It was a joke! There were a lot of dedicated people at Apple, including myself, that helped to make this dream become a reality. The "myth" that I would like to clear up is that Apple DID have a clue and a lot of great people at Apple have been working really hard for that last few months, making a lot of personal sacrifices to make sure that all the awesome work from Dr. Varadarajan and the rest of the cluster team could be possible and successful. That's my 2 cents.
Jerome Holman
Apple Campus Representative @ VT
http://filebox.vt.edu/users/jeholman
Exactly. Under plain-vanilla PPC Linux that cluster would be literally smokin'. The G5's thermal management must be software controlled.
Now, that's a real number.
Shop as usual. And avoid panic buying.
Ah, finally someone who is actually involved with the project. Can you tell me what the total cost of the super comptuer?
The $5.2M figure seems to just be the Towers (Dual 2Ghz + 4GB RAM is $4814 with the standard educational discount, mulitply by 1100 and you get $5295400). What was the additional cost of the Infiniband cards and switches, the Cisco switches, the racks, and the cooling equipment? Were any modifications necessary for the building (more power, etc)?
I am speaking from experience when I tell you that building a large cluster from desktops is just not a good way to go. They take up a hell of a lot more room, they put out a lot more heat, and the remote management capabilites are degraded.
Desktops take up more room, correct. And yes, the desktop G5 does not have a console serial port like the xServe does. But seriously, how many modern clusters do you see with a terminal server connecting to each of the node's serial port? These days it's all install-and-run. OS X is UNIX... you can do a lot with a remote shell. These folks will never need to sit down at a GUI for each node. If you look at their setup photos, you'll see that they even removed the gfx card from each node.
And... desktops DO NOT put out more heat that a similar rackmount unit. The hard drives are the same, the processors are the same. A larger case does not create more heat. More heat may be expelled due to better fans, but that is a GOOD THING, you don't want your board, ram, and processors to cook. The only difference between the two is the power supply. Slim rackmount machines generally have smaller power supplies. But, with modern switching power supplies, there is nearly no difference in power consumption (and, by the laws of thermodynamics, heat output).
Once you go rack, you never go back. I much prefer a rack of 1U units that are built to be used in cluster situations.
Yes and no. A rack of 1U servers is small, compact, snazzy looking, and neat. But, you also increase the number of processors per square foot, which can be a cooling issue. With a concentration of heat in that area, more cool air will need to be directed to the rack.
I guess VT also has the luxury of running CPU intensive tasks. Those machines can only 8 GB RAM while other offerings can hold 16 GB and if they start to swap....ouch, not having SCSI drives will hurt.
4 GB per processor is pretty good for the current HPC world. A lot of monster supercomputer are still sold with 2 - 4 GB per processor. The G5 can unoffically support 16 GB via 2 GB DIMMs, but Apple has not certified this. SCSI drives are great for a big RAID, fibrechannel is even better. But for the drive in each node, IDE is fine. Even Google uses IDE drives in their nodes (which they use as a distributed filesystem too!).
All in all this setup is very impressive when just considering CPU performance. Wonder what is going to happen when a proffessor needs to run a few hundred jobs that use 10 or so GB of RAM each.
The prof will have to re-write his code to use less ram per processor. This is a cluster afterall, and code for clusters have to work with a fixed amount of ram per node. This is not a Cray X1, SunFire15K, or SGI Origin with high thruput, low latency global shared memory. Very very few supercomputers, and even fewer clusters, have 10 GB of ram per processor. Even 8 GB per proc is pretty rare today.
If the thread did need that much ram, it would be possible to pool memory between several nodes, it wouldn't be too fast, though (but still WAY faster than swapping to any harddrive). I believe they're currently getting a little over 800 MBytes/sec real-world thruput via the 20gbit full duplex Inifniband interconnects.
See http://www.netlib.org/benchmark/performance.pdf page 53.
1. Earth simulator
2. ASCI Q
3. Virginia Tech G5 cluster (9.555 Tflops and rising, $5.2M HARDWARE ONLY)
4. PNL Itanium2 cluster (8.633 Tflops, $24.5M HARDWARE ONLY)
So nope, not only will the PNL Itanium2 cluster not be #2, it will also be 1Tflop behind the Virginia Tech cluster, and it will have done it at almost 5 times the cost. Bravo!
You could calculate a new marketing BS peak number where multiply-add only counted as a single flop, or you took into account some realistic cache miss rate. The new lower theoretical peak would give you a much higher efficency.
You did precisely the mistake to which he was refering: The G5 FPUs can perform a "fused, multipl[y]-add operation per cycle, so you get 2 flops per cycle" per processor. Therefore:
2 FPUs/ CPU * _2_ floating point operation per cycle per FPU = _4_ flop per CPU per cycle
_4_ flop per CPU per cycle * 2 Gcycles per second = _8_ Gflops per CPU
_8_ Gflop/s per CPU * 2 CPU per machine = _16_ Gflop/s per machine
A fused multiply-add is f0 = f1 * f2 + f3, which is two floating point operations in a single instruction. Each FPU on a G5 can execute an FMADD each cycle. So:
1 FMADD per cycle = 2 flop/cycle * 2 FPUs = 4 flop/cycle * 2 CPUs = 8 flop/cycle * 2 GHz = 16 Gflop/s
From http://macslash.org/article.pl?sid=03/10/28/235723 5&mode=thread
"The total cost of the asset, including systems, memory, storage, primary and secondary communications fabrics and cables is $5.2mil. Facilities upgrade was $2mil. 1mil for the upgrades, 1mil for the UPS and generators."
Total: $7.2M + essentially "volunteer" assembly
So it's still a LOT cheaper than anything even close to comparable.
This was supposed to be a 64bit cluster so P4s were out. Itanium was too expensive and Opterons weren't out except as parts that would have to be assembled and that wasn't going to fly for their requirements. Can you imagine the risk of having AMD declare your assembly methods out of spec and refuse to replace any downed processors? This is a multi-million dollar cluster. They needed a chip and a chassis and they wanted it right then.