Factual 'Big Mac' Results
danigiri writes "Finally Varadarajan has put some hard facts on the speed of the VT 'Big Mac' G5 cluster. Undoubtedly after some weeks of tuning and optimization, the home-brewn supercluster is happily rolling around at 9.555 TFlops in LINPACK.
The revelations were made by the parallel computing voodoo master himself at the O'Reilly Mac OS X conference. It seems they are expecting and additional 10% speed boost after some more tweaking. Srinidhi received standing ovations from the audience.
Wired news is also running a cool news piece on it. Lots of juicy technical and cost details not revealed before. Myth dispelling redux: yes, VT paid full price, yes, it's running Mac OS X Jaguar (soon Panther), yes, errors in RAM are accounted for, Varadarajan was not an Apple fanboy in the least... read the articles for more booze."
Big Macs are bad for your health.
here
Geminatron
this once again proves what mac fans have known forever.
Dollar for dollar, macs give more power and are a better value than PC's.
This is computers, not bowling!
-Libertarian secular transhumanist
....ok, we've really got real numbers THIS time!!
do() || do_not();
I haven't seen a cluster of Macs this big and powerful since the last annual pimp convetion!
Now, where did all the tricks go?
Until Slashdot fixes the funny modifier, use insightful or interesting. The poster knows your intentions.
vodoo? brewn? Check yo self, boy.
PS: 15th post!
Is that a word? How about brewed? Hate to nit, but .... aw... nevermind.
can I run linux on it?
160 apple computer! they are the perfect company for geeks like us. lean Say for a fact that is every ampengwas like Ur. Jobs Brainchild, the world would bea much better place.
Suck it up whiners, at least you don't have this particular upgrade and patch cycle...
"Talk minus action equals nothing" - Joey Shithead, D.O.A.
"Talk minus action equals
The x86 cluster would have been twice as expensive. And this outpreforms the highest ranking x86 cluster, which has more processors.
I've always been sort of intrigued by Top500 Has there ever been a good comparison written about the similarities/differences between a 'supercomputer' and the regular pc sitting on my desk running Linux/2k? At what point does the computer in question earn the title "Super"?
The power usage (think cooling the room) for a similarly-performing Athlon cluster would likely more than make up for what phantom price difference you are talking about.
MORTAR COMBAT!
>>yes, VT paid full price
This is disgraceful! Hundreds of Macs on one purchase order, and they couldn't (or chose not to!) negotiate a deal? The Virginia taxpayers should be outraged! Good grief, if I bought 600 loaves of bread from the corner market, I'd expect a discount. Perhaps they were more interested in making the press than being good stewards of the public trust. After all, the college knows the taxpayers will have pay the bills, sooner or later.
Shameful.
I think it's interesting that he wasn't a Mac fan at all before this project. He says he chose it because it had better performance than everything else out there ("Ironically, they lost the gigahertz game," he said of Intel. "(The G5) is extremely faster than the Itanium II, hands down."), and was cheaper too (Dell and other manufacturers quoted prices between $10 and $12 million, vs. the $5.2 million or G5s).
What more do you need? Faster systems, cheaper total cost, and slick looking cases.
They costed the G5 against Dell and IBM offerings and the Apple solution was cheaper. Where did you get your numbers? Why don't you go out and price out a Supercomputer for me will ya? Of course you know that it isn't feasible to BUILD 1100 units.
...And when they came for me, there was no one left to speak out for me." - Martin Niemoeller (1892-1984)
They could have got at least double performance.
/. or anything else on this subject, you'd know that's simply not true. But then, the world needs myopic lemmings, too.
Wow. Nine whole comments before the first troll. If you'd been reading
World's tallest building rises in the desert
....maybe i'm obtuse, but i keep hearing about this thing as "..and we're only seeing X% of its real potential right now!"....
1) Why can't they just shout "Let 'er rip!!" and crank the thing wide open?
2) Why all the media buzz concerning this as a `surprise' when they've already got its performance figured out, apparently?
Sorry.
do() || do_not();
An audience member asked if he'd made the purchase through the Apple store. Varadarajan smiled and said that actually, yes, he had.
[snip]
yes, it's running MacOSX Jaguar ( soon Panther)
More like whole-lotta-CD-jockying. Perhaps the bio department can lend a hand by donating the services of their chimps to handle the CD swapping.
(Yes, I'm aware there are smarter ways of doing it, but isn't it a fun mental picture, 100 chimps running around a cluster of G5's and throwing bananas and CDs at each other?) Talk about your fun install-fests.
Please help metamoderate.
Until then, quit your trolling.
Don't blame me; I'm never given mod points.
>Imagine if that put all that money down on an Intel/Athlon >cluster.
Imagine if you actually spent some money to learn grammar.
This is simply an amazing achievement. Plenty of people have built supercomputers from huge piles of x86's, but this team managed to not only pull the trick off in less time, for less money, but on a new hardware platform. I certainly follow their logic (PPC's have always been far better than x86's for real scientific-level precision FLOPs) but it's a really gutsy move betting your entire supercomputing program on a new CPU, new hardware platform, etc., and on your ability to get everything ported to the PPC -- that's a lot of risks to take, and a small school like that can't afford to fail, even building a relatively cheap supercomputer. But it clearly paid off! Not only did they get great PR for the university, they got a great computing resource for the students and faculty, and by doing it themselves rather than buying a complete system from a vendor, I am sure that those students all learned far more. And those 700 pizza and coke consuming students that cranked the code will all be able to say that they were part of this amazing thing.
Damn!
Enable 3D printed prosthetics!
As of one day ago, as far as I know the memory issues had not been solved (I just talked to someone who talked to Srinidhi last night). Rather, for benchmark purposes, the RAM errors should not be a big deal since they can always rerun. For actually dealing with the errors, they plan on switching to ECC RAM machines once Apple is ready to ship them (presumably there is some type of special deal there, though I've heard nothing concrete about it).
imagine a beowulf cluster of these
Uh, RTFA, as was stated it was cheaper than any combination offered by any pc out there. but whatever, I guess I shouldn't be surprised, this is Slashdot where no one reads the article before spouting off...
'yes, errors in RAM are accounted for,' And no malloc library benchmark jumbling bullshit this time? T minus 10 minutes before some PC nut looks at all this, sees that the Mac relies on something a PC can't do, and 'blows the whistle'. T minus 15 minus before they realize it's the OS.
So he went full price with the G5 ($3000 apiece) and for only $5.2 million has the number 3 slot and is shooting for a 10% boost.
Varadarajan told the audience he would publish full documentation and release most of the code written for the machine. However, some of the software is subject to patent applications, he said, and he wasn't yet sure if it would be released under an open-source license.
What's up with that?
Used to be that work like this done at a Univeristy was considered 'open' as in available to anyone to help advance the state-of-the-art. Not anymore...
So, the other really cool thing they are doing is open sourcing the code for error checking and connectivity.
This is in addition to consulting where they are helping others build similar clusters.
Visit Jonesblog and say hello.
I read something once (can't seem to find it again, anyone else?) that said the Athlon CPU was (pound for pound) the most efficient heat-source (from electric anyway) in the world.
Think you could power the super computer by using the heat from the CPUs to boil water and spin turbines? ;)
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Carbon Unit 549 (325547) deemed obsolete. Slated for termination.
It seems a waste that they used the stock Jaguar distrobution instead of creating an optimized distro. If they made a G5 optimized kernel and ran a G5 optimized linpack in single user mode with all unnecssary features stripped out then I bet they could of had at least 10% extra performance. I know, because I compiled Gentoo on my G5 (only took 30 minutes BTW, thats how fast it is) and it runs KDE a hell faster than AQUA. So why did they use a slow operating system on that hardware?
Screenshot of My G5 desktop!
Suddenly someone'll be on some some cool yacht in Europe.
Unless pizzas are horribly more expensive all of a sudden. Noooo!
My photolog
Not trolling. I happen to build and use clusters for computational fluid dynamics simulations. We use commodity P4 processors and 100 Base T eithernet at between 50 and 90 percent efficiency. Thus, speeding more than 50% of the cost of a cheap node to increase efficiency is not worth it.
Of course, there are applications that benifit from the faster connections. I would just like to hear about them.
nohup rm -rf ~/. >& zen &
You posted something that lame under your real handle.
Fucking moron.
Wow.. I can't believe Apple didn't cut them a break for buying 1100 Dual G5s.
You'd think apple would at least sell G5's to VT without SuperDrives and Radeon 9600s. I seriously doubt those things (especially the video cards) will get a lot of use in a giant cluster.
But, hey, even with all that pointless extra hardware, this cluster is still less then half the price of a comparable intel system from Dell or IBM. Weird.
"Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
The speed of the interconnect puts a limit on the number of nodes you can practically connect.
Given our CPU power has been growing far faster than the networking speeds, he chose well in that aspect. Low-latency communication is vital to most (not all) parallel applications.
Slashdot Patriotism: We Support our Dupes!
From the wired article:
"After his presentation, a group of nerds followed him to the hotel's bar for drinks, hanging on his every word."
How dorky did these guys have to be to have a reporter for "Wired" catagorize them as nerds...damn....
Think how much faster it will be when they switch all the nodes over to Linux! :-)/2
Macs come standard with gigabit. And it is cheaper overall. And it is faster overal.
sounds like a pretty good deal, eh?
perhaps the folks using the cluster need the 64 bit address space? you can run of memory space pretty darn quick on a 32 bit machine now.
PHP is the solution of choice for relaying mysql errors to web users.
... but that doesn't matter. An accomplishment is an accomplishment. Besides if an AI manifests itself it'd be less likely to destroy the world and more likely to tell you that your white socks do not match your purple tie.
In this house, we obey the laws of thermodynamics!
youd have to have a small input of energy, but i bet its pretty insignificant.
turn up the jukebox and tell me a lie
"In theory, every CPU in the cluster is able to perform two floating-point operations every clock cycle, but only if one of those is a multiplication and the other is an addition. The two occur in combination fairly often in scientific computing."
My understanding is that it can perform two FMADD instructions every clock cycle.
Integrate Keynote and LaTeX
I am somewhat surprised that there isn't a G5 version of the XServe yet. I guess the G5 chips are still pretty scarce. (Or else Apple's really taking the time to get the G5 XServe right... or both.)
However, if G5 Macintosh systems like this become "popular" in supercomputing, maybe that's a reason to get a G5 XServe out there sooner. I'd imagine a rack mount system would be easier to deal with than a bunch of towers.
Avoid Missing Ball for High Score
With that kind of purchase, it would be first class travel for years!
...don't expect the manufacturer to step out on that limb with you.
Obviously that fan/heatsink combo was there for a reason. You removed it, you paid the price.
"I added a superior cooling system to the machine, quietened it, IMPROVED it in every way, and they deny my claim?"
Obviously your modified cooling system was quieter, but I suspect it was actually quite inferior.
I am very small, utmostly microscopic.
It's common practice to deny claims to non-stock machines in the whole computer industry, we wont even replace a harddrive here if its still under its warranty. It's just not worth the risk of them not taking it in
"Slashdot, where telling the truth is overrated but lying is insightful."
To those who are wondering why the G5 is a serious contender for supercomputing applications( and why VT decided the way they did ), you may want to follow this link: http://www.chaosmint.com/mac/vt-supercomputer/
Here's a quick rundown:
Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during bidding]
Sun (sparc) - required too many processors, also too expensive
IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis available
HP (itanium) - same
Apple (IBM PPC970) - system available with chassis for lowest price
The online store cannot accept a 1100 order of G5's at one time. So either he set up a script and a credit card number with a $5 million dollar limit. Or he called 1-800-MY-APPLE (err MY BIG MAC) and placed a order with a rep. Funny thing is those reps work on commission, which begs the question. Who scored the "Big Mac"? and are they going to retire?
Varadarajan wanted a high-performance supercomputer based on a 64-bit processor and never looked at 32 bit. In addition, he felt that clusters imply gigabit Ethernet.
This is a pretty weak statement to justify the substantial increase in cost for the going to 64 bit and InfiniBand interconnect, compared to just buying more nodes.
Do you know what 64 bit means compared to 32 bit? It is a significant increase in the size of instructions, address space, registers, numbers, etc. (they are not just doubled)
Also, The main emphasis of the article is how cheap it was to build.
The Athlon CPU produces more heat per square centimeter than any other heat source in the world.
;)
And no, it's not efficient enough to be a perpetual-motion machine.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
So it will only cost them $20 for the upgrade.
One good thing about music, Well, it helps you feel no pain. So hit me with music; Hit me with music now. -- Bob Marley, "Trenchtown Rock"
This quote is bugging me. It's really "When it hits you, you feel no pain."
Do you have any links to this offhand? It'd be an interesting conversation piece (or to win bets in a Geek bar ;)
And no, it's not efficient enough to be a perpetual-motion machine.
Drats! Back to the drawing board.
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
"The IBM with a PowerPC 970 was a first choice but the earliest delivery date would have been January 2004."
"On June 23 Apple announced the G5."
I was under the impression that the G5 was a Power PC 970. Is it just some derivative of the Power PC 970... or what?
This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!
More to the point, if he "never looked at 32 bit", why did he buy machines which will only run a 32-bit OS for the forseeable future? Until a pointer is a 64-bit number, you cannot do anything on these machines that a 32-bit CPU with a 32-bit OS won't do. You can buy 32-bit machines with more than 4 GB of RAM and use it just the same way OSX does.
The parent is not a troll.
Big Macs are sometimes made out of mammals.
Macs run "headless" if no video card is detected at startup.
>Id say if I lived in Virginia, and paid taxes, I >would be happy.
It's time to play: Let's Spend Your Tax Dollars!
Ready?
Pothole fixed....
5.2 million dollar supercomputer....
Pothole fixed....
5.2 million dollar supercomputer....
I think I will have to go with the supercomputer, Chuck.
Just FWIW, they are claiming power usage of 1.5MW for this cluster of 2200 processors. Cray just released the numbers for their upcoming Red Storm cluster with over 10,000 AMD Opteron processors, just slightly less than 2.0MW.
Long story short, this Big Mac cluster consumes a LOT of power. To be fair though, the Earth Simulator apparently uses around 3.5MW of power, so on a power/performance comparison, the Big Mac and Earth Simulator are roughly on-par with one another.
From the summary: "the home-brewn supercluster is happily rolling around at 9.555 TFlops"
Ignoring the "brewn" part of things, since when does "home-brewed" mean "designed and funded by a major university"?
I usually think of "home brewed" as something that someone put together at home. With their own money. In their spare time.
This is *not* a home-brew supercomputer, it is an institute designed and created super computer.
That is all.
Just because I doubt myself does not mean I find your position compelling.
X Serve has something to do with this, plug and play internet connected Super-Clusters?
WOW! go Apple!
And this outpreforms the highest ranking x86 cluster, which has more processors.
The x86 cluster was built a year and a half ago.
OF COURSE this thing will be faster.
Why don't YOU RTFA with some perspective.
But what about operating costs? And space? When I was in school, space, especially in engineering facilities was at an extrordinary premium, and even with our own physical plant providing steam (for heat) and some power, electricity and the quality of it was of concern as well. If this cluster was funded through part of a construction budget, I think your argument would carry more weight. But if space and operating costs are at a premium, with out even bringing in performance advantages the G5 might have from 64bits and altivec which could be exceptionally potent for certain kinds of problems, I think there's a good case to be made. (For the record, not really an apple fan, I land somewhere between apathic and distainful.)
I know IBM does very well with PPC archietecture, a lot of their machines score very high in efficiency of use with these machines (iSeries and pSeries are near the top of their categories when it comes to using multiple processors efficiently)...
:D
just trying to nail down the number of this beast.
Anon again
1. Mac OS X v10.3
2. Mac OS X v10.3 Family
3. iPod
between that and the iTunes music store (1) i bet there will be some upcoming article about how Apple users seem to be more honest than M$ users... and happier too.
(1) Before the release of iTunes for Windows, Apple was selling more songs than all of the other online music sellers. That is impressive considering the actual number of Mac users.
Here is da slide-show
This page was generated by a Barrel of Circus Midgets, and that is the way I like it!!!
I keep seeing reference to some sort of software that will defeat hardware memory errors.
How, pray tell, are they planning on detecting these errors? I can understand how you could reduce the frequency of errors with only a slight loss in performance, ie take some sort of checksum of your data after every x number of cycles, but that doesn't eliminate the errors, only reduces their frequency. Maybe it reduces the frequency by enough that you don't need to worry about it, especially if 'x' is a sufficiently small number, but it still seems like a pretty risky prospect to me.
Anyone seen any actual TECHNICAL details on this point, ie not just some Mac fan yelling "Deja Vu, DEJA VU!!!"?
HEELLLLOOOOOO!
But in the winter it could heat the dorms at VT. Duh!
Varadarajan is indeed going to publish/open source the documentation and source, which is clear if you RTFA. The parent is giving a false quotation, and is clearly a troll.
Wired's target audience isn't nerds, it's nerd wannabe's. A half-dozen years ago I was working in web design with a cow-orker who managed to read that day-glo typesetter's nightmare, and I had to keep telling her that 'twas better to spend her time doing stuff that would make her article-fodder, rather than article-reader.
the GNAA takes the Jihad out!
For anyone interesting in learning a bit more about what some of the issues are when creating a super-computer, you might want to have a look at the following:
Red Storm PDF
The article is talking about Cray/Sandia's new Red Storm machine, a supercomputer using over 10,000 AMD Opteron processors that is expected to be competitive with the Earth Simulator for the #1 spot on the Top500 list. It does, however, talk about a lot more than just the specifics of this cluster, describing what some of the bottlenecks in supercomputers are and how to avoid/work around them.
Maybe IT management will read this and finally take note. TOC for backend management is cheaper on the Mac platform.
Michael Merry
Merryworks
Is this shit really necessary every single time there is an apple story?
I've seen this troll before, word for word, in several Apple related articles.
Avoid Missing Ball for High Score
I am speaking from experience when I tell you that building a large cluster from desktops is just not a good way to go. They take up a hell of a lot more room, they put out a lot more heat, and the remote management capabilites are degraded.
Once you go rack, you never go back. I much prefer a rack of 1U units that are built to be used in cluster situations.
I guess VT also has the luxury of running CPU intensive tasks. Those machines can only 8 GB RAM while other offerings can hold 16 GB and if they start to swap....ouch, not having SCSI drives will hurt.
All in all this setup is very impressive when just considering CPU performance. Wonder what is going to happen when a proffessor needs to run a few hundred jobs that use 10 or so GB of RAM each.
Ok, I just clicked on the links, wasted an hour reading a bunch of stuff. I give up, how does he detect memory errors?
I've been to the Apple Store too, you know; you can get the *base configuration* for $3,000, but that's it. Apple charges a huge premium on their RAM upgrades. And from what I've heard, these machines were specced out. So something's gotta give, in this story. Did he buy more RAM from a third party, later?
Otherwise, if I ever want to buy a G5, I'm talking to this guy first; I bet he can hook us up!
pb Reply or e-mail; don't vaguely moderate.
Uhm, it is 42 bit addressing, not 64 bit. This limits the G5 to a mere several terabytes of RAM, instead of Petabytes when addressing in 64 bit. Still damn coll though.
And I need all 6 gigs I have in my G5 right now. Look at MOTU's Machfive for my justification.
or does anybody else think the new slashdot G5 icon looks like a bad photocopy?
When IBM comes out with the $3,500 4-way 970 (G5 in Apple-speak) workstation it will be interesting to see what people do with it. Imagine a cluster that is 17% more expensive but with twice as many processors...
Lasers Controlled Games!
Um...so I read the article.
The "lots of juicy technical...details" are where, exactly?
my teacher is on slashdot, woohoo! if only i still had that link to his homepage
The best I could do checking every option in one pass through the Apple Store was $29,662.15
And that's with every digital camera and printer and firewire drive, display, video camera and software option the gave me and a 40 gig iPod to boot.
The G5 with only stuff the goes in the tower cost $9,423.00.
'Course I guess you can't get a Dell with 8 gigs of RAM yet so we'll have to wait to compare.
"Hee-hee-hee. He said 'Goto'. Snort."
Reality is defined by the maddest person in the room
That will potentially arise from various operating systems? I can see the hilarity ensue from such an endeavor already! C'mon, there must be someone who's already done such a thing.
Why would VT spend so much money on this? Can someone point me to a site with why they are doing this please. Not a joke, I'd like to know what the VT people are going to use this for.
Thanks all.
Overpriced crap. Imagine a Beowulf cluster of Linux PCs instead.
Okay for everyone asking about optimizations, why do it?
Look at what they built: a complete COTS supercomputer, miniscule price, functionality in six months, public data in a year. They have >9Tf right outta the box.
Yes they have written their own software, but name a company that doesn't? They modded them (cooling I think, but I couldn't find data only pics.) They bribed students with pizza and soda, they didn't have to buy, make or gut a building. What is amazing is they showed that any simple slashdot pundit could build one if given these resources.
Unreal Tournament framerate ... *drool*
HiPod
One man's pink plane is another man's blue plane.
Just FWIW, they are claiming power usage of 1.5MW for this cluster of 2200 processors. Cray just released the numbers for their upcoming Red Storm cluster with over 10,000 AMD Opteron processors, just slightly less than 2.0MW.
Ugh, this is getting old.
Red Storm, the machine by itself itself, uses 2.0MW.
Big Mac and all of its networking gear uses less than 0.75MW. The supercomputing center itself (building, air conditioning, UPS battery charging equipment, and the 1100 G5s) is fed by a 1.5MW substation feed. They're still not even maxing out the substation.
The latest, fastest Opterons (not the scaled down low-power Opteron for blade servers) consume 53 watts at full clock. PowerPC 970 @ 2 GHz consumes 48 watts. The U2 and K3 motherboard chipset on the dual G5s uses just as much power as the PowerPC 970 "G5" processors. Hell, the power supply in a dual processor G5 system is 550 watts. 550 x 1100 machines = 0.61MW.
So will this beat a new 9800XT in doom III?
That's GNU/Apple, you corporate sellouts.
MOD PARENT DOWN, PLEASE
Good sir, methinks thou shoudst not jack so much.
Wow, big scam Apple's running there, I can't even believe it. By the way, if you replace your car's radiator with a better system you made yourself and the engine block cracks, they won't cover THAT in the warranty either. I know, what dicks!
I dunno what happened...Apple used to bend over backwards with its warranty. A month ago my friend sent in his OBVIOUSLY HACKED (missing screws, broken pc boards, don't ask) laptop for hard disc service and they not only fixed his hard drive, then replaced his cracked lower case with a brand new (or at least, cleanly buffed) one.
Of course, he didn't take out his hard drive's head motor and replace it with a non standard unit because it was too noisy. I'll bet if he had, they'd have cunningly conned him out of free service to repair his hack.
Hey freaks: now you're ju
Somebody mod the parent post up. That's absolutely great. Slashdot needs more information like this, and less trolling from idiots who can't be bothered to read.
10010010010010010001101000010
Now those are real numbers!
I am Monkey, the Great Sage, equal of heaven!
The only user-servicable parts in a PowerMac are the RAM, Drives, wireless cards and PCI expansion slots (again, check your documentation). Any other modification will void your warranty. Just because you believe you improved the situation doesn't make Apple responsible. If I think pressing a jelly doughnut into my Dell's motherboard will improve it, and the system is fried when I power it on, I don't think Dell will replace it for free, even if I clean up all the charred Jelly bits. No manufacturer will be responsible for repairing a system you broke if was working before your modifications.
BTW: I've seen this story multiple times in several forums, and it doesn't garner you any sympathy. It just reiterates that you still refuse to take responsibility for your own actions.
Of course, you know that Teraflops in computers are a lot like warp speed in Star Trek... You can only come infinitely close to 10.0 Teraflops. If it were possible to actually go 10.0 Teraflops, all computations would occur in zero time.
If I'm reading the up to date documentation correctly (see my earlier post for the exact wording), the school can purchase a single upgrade kit for $20, and request the "right to copy" for the remaining 1099 PowerMacs.
I usually never reply to these things, but I think it is funny that people are arguing about how he ordered on the Apple Store. I find it even funnier that people would even go to the Apple Store and try. It was a joke! There were a lot of dedicated people at Apple, including myself, that helped to make this dream become a reality. The "myth" that I would like to clear up is that Apple DID have a clue and a lot of great people at Apple have been working really hard for that last few months, making a lot of personal sacrifices to make sure that all the awesome work from Dr. Varadarajan and the rest of the cluster team could be possible and successful. That's my 2 cents.
Jerome Holman
Apple Campus Representative @ VT
http://filebox.vt.edu/users/jeholman
2 FPUs/ CPU * 1 floating point operation per cycle per FPU = 2 flop per CPU per cycle
2 flops per CPU per cycle * 2 Gcycles per second = 4 Gflops per second per CPU
4 Gflops/s per CPU * 2 CPU per machine = 8 Gflops/s per machine
where does the extra 2x come from?
How come no one is complaining about Indian programmers having no creativity now? And No, I am not a troll. All the comments marked "interesting" and "informative" in anti-India bashing show the attitude of the people here. You guys are just bigots.
More correctly, the boxes were Pizza'd together. It took 600-700 pizzas for the students to get 1100 Macs installed in the cluster, so that's about 2/3 Pizza per Mac.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
>In addition, during this file transfer, Internet Explorer for Mac will not work.
>And everything else has ground to a halt. Even my text
>editor is straining to keep up as I type this.
Donno what the hell you are talking about. I made a similar-sized transfer from an external scsi drive to my G4's internal drive about 3 hours ago, except it was folders of files rather than one big file.
At the same time, I was happily using Safari, Macromedia Director, Emailer, Excel, Word, iCal, while also copying files to and from a network server, and was SSHed into another machine.
Oh, yeah, I was also running a few apps in Win XP Pro in a Virtual PC window.
I don't know what your problem was, but I'd have to guess "User Error" or "A Problem with the Truth".
>Mac addicts, flame me if you'd like, but I'd rather hear
>some intelligent reasons why anyone would choose to use
>an Mac over other faster, cheaper, more stable systems.
Why? I bill by the hour and actually have to get work done. Christ, it takes me considerably more time to keep up with the MS security updates on our two PCs as it does to maintain the rest of our 14 Mac / 2 FreeBSD / 1 Linux box network.
So, when's Apple gonna feature Varadarajan in a "Switcher" commercial?
--R.J.
Electric-Escape.net
I remember a few years ago, an Apple ad was themed "a supercomputer on your own desktop." IIRC, this was when the G4 was announced, and that value was helped by its vector unit.
At least now, the ad can (more truthfully) claim "part of a supercomputer on your own desktop" (duck ;-)
So they still had a discount..
hmmm... gentoo-64-ppc anyone? :D (does it exist?)
This kind of Top500 pissing contest will continue. Want to get to the top of the list? add more nodes. The benchmark is useless for real applications. How many codes does VT plan to run that use LINPACK routines and use all those processors?
My guess is none.
Plus the wired article is just plan wrong on some points. Most other machines of its class cost upward of $40 million and take years to assemble. Not really. The next "computer genius" that convinces his univertisty to build a bigger cluster will do it in the same amount of time with the same amount of student labor. It is just putting boxes on shelves, connecting wires, and running a simple benchmark. I bet they did not even use the infiniband interconnect.
VT wanted them ASAP, and Apple had to push back delivery of PREPAID and RESERVED orders to normal consumers to meet the demand. They fonted extra money to get first in line, which is fair game in my opinion.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
Ah, finally someone who is actually involved with the project. Can you tell me what the total cost of the super comptuer?
The $5.2M figure seems to just be the Towers (Dual 2Ghz + 4GB RAM is $4814 with the standard educational discount, mulitply by 1100 and you get $5295400). What was the additional cost of the Infiniband cards and switches, the Cisco switches, the racks, and the cooling equipment? Were any modifications necessary for the building (more power, etc)?
Would still be faster and cheaper than even today's fastest Xeons. So STFU.
- #0 - 58 TFLOPS - SETI@HOME
- #1 - 35.8 TFLOPS - NEC Earth Simulator
- #2 - 13.8 TFLOPS - LANL ASCI Q - HP Alphaservers
- New #3 - 9.55 - Big Mac
- Old #3 - 7.6 LLNL - MCR - Xeon Cluster
It's not easy to do a good comparison, because the Top500 List is officially based on LINPACK, and SETI@Home is of course running SETI calculations instead. But if their figures are vaguely comparable, the world's fastest computer is a volunteer effort to look for Space Aliens, and the second fastest is modelling Earth and the weather, and it's not till you get to the third fastest that you get to machines used to design weapons of mass destruction or all the things the nuke guys do that they pretend aren't quite directly weapons-related.Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Here, in ohio we have columbus state community college which has over 22,000 students.
Hmmm... Pie...
I am speaking from experience when I tell you that building a large cluster from desktops is just not a good way to go. They take up a hell of a lot more room, they put out a lot more heat, and the remote management capabilites are degraded.
Desktops take up more room, correct. And yes, the desktop G5 does not have a console serial port like the xServe does. But seriously, how many modern clusters do you see with a terminal server connecting to each of the node's serial port? These days it's all install-and-run. OS X is UNIX... you can do a lot with a remote shell. These folks will never need to sit down at a GUI for each node. If you look at their setup photos, you'll see that they even removed the gfx card from each node.
And... desktops DO NOT put out more heat that a similar rackmount unit. The hard drives are the same, the processors are the same. A larger case does not create more heat. More heat may be expelled due to better fans, but that is a GOOD THING, you don't want your board, ram, and processors to cook. The only difference between the two is the power supply. Slim rackmount machines generally have smaller power supplies. But, with modern switching power supplies, there is nearly no difference in power consumption (and, by the laws of thermodynamics, heat output).
Once you go rack, you never go back. I much prefer a rack of 1U units that are built to be used in cluster situations.
Yes and no. A rack of 1U servers is small, compact, snazzy looking, and neat. But, you also increase the number of processors per square foot, which can be a cooling issue. With a concentration of heat in that area, more cool air will need to be directed to the rack.
I guess VT also has the luxury of running CPU intensive tasks. Those machines can only 8 GB RAM while other offerings can hold 16 GB and if they start to swap....ouch, not having SCSI drives will hurt.
4 GB per processor is pretty good for the current HPC world. A lot of monster supercomputer are still sold with 2 - 4 GB per processor. The G5 can unoffically support 16 GB via 2 GB DIMMs, but Apple has not certified this. SCSI drives are great for a big RAID, fibrechannel is even better. But for the drive in each node, IDE is fine. Even Google uses IDE drives in their nodes (which they use as a distributed filesystem too!).
All in all this setup is very impressive when just considering CPU performance. Wonder what is going to happen when a proffessor needs to run a few hundred jobs that use 10 or so GB of RAM each.
The prof will have to re-write his code to use less ram per processor. This is a cluster afterall, and code for clusters have to work with a fixed amount of ram per node. This is not a Cray X1, SunFire15K, or SGI Origin with high thruput, low latency global shared memory. Very very few supercomputers, and even fewer clusters, have 10 GB of ram per processor. Even 8 GB per proc is pretty rare today.
If the thread did need that much ram, it would be possible to pool memory between several nodes, it wouldn't be too fast, though (but still WAY faster than swapping to any harddrive). I believe they're currently getting a little over 800 MBytes/sec real-world thruput via the 20gbit full duplex Inifniband interconnects.
-- conspiracy mode on --
How do you know who made the decision to go with Apple? Maybe Apple is offering the decision makers 10% of 5.2 million dollars under the table provided they don't get a discount. Heck, maybe Apple went upto them and gave the managers the offer, and VT managers couldn't lose.
Apple wins. Decision make wins. VT loses 330K. But no one knows about it.
-- consiparcy mode off --
The above scenario is very real in today's world. It happens all the time, and not very hard to believe. And who's gonna notice the 330K distributed to all managers? It's prolly locked up in some swiss account -- away from prying gov't eyes.
Kashif
My prof at the U of Michigan was pointing out the architectual oddity of every product intel makes. The itanium first off for all its VLIWness fetches fewer instructions per fetch than the G5 6 vs. 8 for the G5, it cannot do out of order exacution, because it has to process the 128 bits of intstructions they cannot ramp up the clock. About the only very interesting thing is how it has so many registers that add very little to its overhead and essentially allow it to avoid costly memory access for putting variables on the stack. Over all itanium2 is not that bad just a little too compilicated and over engineered, I do not argue with their decision to make a processor that does not do OOE or has such large caches and such a big register file but it is the VLIWness that gets them, it adds to much complexity and too many gate delays. But yeah basically my comp arch professor was saying that benchmarks aside the G5 was arguably the fastest desktop processor out there. Also the apple PI architecture does not hurt either, the powermac G5 has a very well designed system architecture too it can keep those processors pinned. Apple and IBM developed one hell of a system considering its cost.
See http://www.netlib.org/benchmark/performance.pdf page 53.
1. Earth simulator
2. ASCI Q
3. Virginia Tech G5 cluster (9.555 Tflops and rising, $5.2M HARDWARE ONLY)
4. PNL Itanium2 cluster (8.633 Tflops, $24.5M HARDWARE ONLY)
So nope, not only will the PNL Itanium2 cluster not be #2, it will also be 1Tflop behind the Virginia Tech cluster, and it will have done it at almost 5 times the cost. Bravo!
Actually for Virginia Tech the cost is $0, still should be. The university has a licensing agreement with Apple. All "university" computers that can run OS X are eligible for upgrades. I've ran down to the Software Distribution Center more than a few times to get all types of CDs.
PS. Same rules apply for Microsoft (Office, Window, etc.)
ie 4fp operations per cycle x 2000cycles so 8 Gflops per processor
this is highly idealized because not every operation can be issued like this compilers are only so good, so for any block of code maximum is 8Gflops per machine on all operations and only 16 Gflops on some operations
I call dumbass.
Go back and read the early slashdot articles that have links to the POs in them. Yes they bought the RAM separately from a third party, they're not stupid! The machines were purchased at full EDUCATIONAL price, with a per unit cost of EXACTLY $2493.00 each.
Do the math again...or read the previous articles that answered this question months ago.
The PPC 970 has a fused multiply-add instruction which allows it to do 2 flops per cycle. Now calculate that in again and you get 8 GFLOPS per proc and 16 GFLOPS per node.
I think we can stop beating this dead horse too, please.
I guess you meant "inthinkable".
WOPR wins in my book, Big Mac doesn't have cool lines and blinkenlights. Clustering is cool, but lacks that Ubergeeky bassy sound that the WOPR made. I don't want a smaller computer, I want one the size of a Honda Civic.
Ok (chugga chugga chugga) $3.3 million dollars. Who has the credit card? (silence, *crickets*, the rude sound of nobody reaching for their wallet...)
Crickets? I would think people'd be standing in line!! Do you know HOW MANY books you could buy at Amazon with the credit you would get from making a $3M purchase on the Amazon card?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
they're running Mac OS X though, so they only have a 32bit address space.
I'm surprised being VT that they didn't just spec out their own motherboard and have an offshore firm build it. Companies like Google do this to save on the costs of components they don't need on a system. With a quantity 1K order, it would probably have saved quite a bit, but of course, there is always the time factor, which is one of the reasons why they didn't want to go with IBM.
From http://macslash.org/article.pl?sid=03/10/28/235723 5&mode=thread
"The total cost of the asset, including systems, memory, storage, primary and secondary communications fabrics and cables is $5.2mil. Facilities upgrade was $2mil. 1mil for the upgrades, 1mil for the UPS and generators."
Total: $7.2M + essentially "volunteer" assembly
So it's still a LOT cheaper than anything even close to comparable.
Nope. Sorry. It wasn't a publicity stunt. Apple didn't give them any discounts, and according to VT officials - if you really want "proof" you can ask any number of them yourself - it took Apple quite a while to even warm up to the idea. And now, they've got BY FAR the cheapest supercomputer ever constructed for anywhere near this price - in fact, half as much, or less, as anything even close - but yeah, it's just a "publicity stunt". What the fuck is your problem? Can't handle the fact that this is the most powerful academic cluster in the world, the second most powerful in the US, and faster than any and all Intel or Intel-compatible architectures and Linux, for a fraction of the cost? *sob*
Opterons will be more expensive. Sorry.
Nothing has ever been done on this scale, this fast, for this cheap. And no Opteron cluster will, either.
Sorry to disappoint.
I love it how this is one of the most powerful and cheapest supercomputers ever, but some fucks still have to disparage or discredit it. Yeah, he's a "Mac shill". Shut the fuck up.
All this guy does is make posts saying we should start speaking lojban and use a hexadecimal number system. I've replied to him many times and he remains utterly serious. He also refuses to see why copyright can actually be a good thing.
So, I suspect that the parent was not a joke, and you should quit asuming. You know, ass of you, ha ha. Get a life.
-- Fighting mediocrity one bad post at a time.
Still, all things considered, not bad performance for a publicity stunt. (#3 supercomputer on the planet) Imagine what will happen when the pranksters at Apple grow up and really get serious.
Yup, as expected the 'Big Mac' results are still brown and smelly.
As in, who gives a shit about superclusters? I bet if you double the number of machines, it'll scale up by about 60%. JUST A GUESS.
> I remember being on an PC and wondering "why
> the heck is it taking so long to transfer files
You mean, 17M?
I dunno man, this seems a LOT like simply buying your way into the history books. I mean these guys just went out and BOUGHT a bunch of the fastest computers they could find, strung them together with the fastest network they could get, wrote some code to make them all talk together and BLAM! instant number three spot on the fastest computer list on the planet. What the hell kind of challenge is that? Sure, anybody can just cough up $7.2M in cash to buyyyy their way into the record books, but the other guys earnnned it the hard way.
Wassat?
Whoops, nevermind. Come to find out the rest of those supposed uberMachines are store bought also. My bad. Man, doesn't anybody do things the fun way anymore?
Glonoinha the MebiByte Slayer
I'd kind of like to hear what they did about security during the setup. Think about it - 1100 very hot items with tons of nerds around. All it takes is one guy to move a computer left instead of right and into his pickup...
Anyone know about this??
>BTW: I've seen this story multiple times in several forums, and it doesn't garner you any sympathy. It just reiterates that you still refuse to take responsibility for your own actions.
No, it reiterates nothing. You've been trolled. I'm so happy for you! =)
It's not offtopic, dumbass. It's orthogonal.
Okay, let's see:
1100 G5's w/4GB each @ $5,349 = $5,883,900
1100 Mellanox Infiniband Cards @ around $1000 = $1,100,000
23 Voltaire ISR 9600 96 port Infiniband Switches (it's two ports per Mellanox card) @ at least $30,000 each (the starter kit with only one 12 port blade costs $13,000 and you need 7 more blades) = $690,000
Total = $7,673,900
Assuming some better pricing here and there and $7 Million sounds more reasonable. There are other hidden costs such as the building, air conditioning (which I am betting runs several thousand per month), power, and labor.
I still think it was silly to get the G5's with full cases instead of just raw motherboards to save costs (that case costs over $200. That's a savings of nearly a quarter of a million dollars plus space), but it is a stupendous achievement to build a cluster this big.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
man what is Slashdot coming to when IT people cant talk tech anymore!!
"Slashdot, where telling the truth is overrated but lying is insightful."
You almost had me paying attention till you spewed that utter flaming crap about needing SCSI drives for swap.
Ummmm, 2*2Ghz*2instructions per processor outruns a SCSI drive interface just like it outruns an IDE drive interface, both of which out run the drive head moving into position to get the right stripe of data which is in turn faster than the fscking platter rotating around. Drive swap?!? Shit, they'll be avoiding that like the plague without regard to the interface.
Anyway, their density is just fine, in fact its dense enough that they had to do some neat stuff to cool it all. You really can't get a whole lot more dense than they did, you need a certain amount of space for cooling all of the stuff off. Yeah, you could be more dense in 1 rack, but 50 full racks in a room would need to be submerged in a liquid to keep it cool.
Your point about memory stands, but it stands on one wobbly leg. That's 8GB of RAM per pair of processors, and with the other machines available I'd image they'd start combining RAM between machines for group-think before they'd start doing HD swap.
-theed
Imagine a Beowolf cluster of...
No, forget I said that...
What are you listening to? (http://megamanic.blogetery.com/)
The "myth" that I would like to clear up is that Apple DID have a clue and a lot of great people at Apple have been working really hard for that last few months, making a lot of personal sacrifices to make sure that all the awesome work from Dr. Varadarajan and the rest of the cluster team could be possible and successful.
Um, if that's the myth, then you're saying it's not true?
-b
myselfmusic
The G5 is about the same speed as a high-end P4 or Opteron in standard benchmarks like SPEC (a little slower actually), and the G5 machines with list or academic price are both more expensive and more expensive to deploy (since they require manual hardware configuration, don't come in rack mounts, and take up lots of space). Ergo, the machine can't offer the best price/performance ratio. Replicating a worse price/performance ratio by 2000 machines doesn't make it any better.
Don't get me wrong: the VT Mac cluster does not sound like a disaster--they paid a bit more and they have to live with a number of maintenance and programming hassles. But it is a fast machine, and the boxes do look pretty.
As for "the biggest switcher", I mean, unless it's a complete disaster, what do you expect someone who just spent millions of other people's money to say? "Well, we could have done better but it'll do?" I don't think so. He's going to try to make his decision look as good as it possibly can.
And that's why they have been hacking furiously in assembly language for the last few months trying to beat Intel/AMD on at least one benchmark and make the cluster look good.
Please tell me someone else got the War Games reference? Am I really getting that old?
Uncle Steve is gonna milk this for all he can, no doubt. I can't wait to hear which early 90s band will be revived to be on the soundtrack to the commercial. I hope Varadarajan got a free iPod.
I was interested in buying a small XServe cluster some time ago, but then canned the project to wait for the G5 Xserve.
On of the problems I saw with an Apple cluster solution is that you have to cobble up your own cluster management software (or at least it was so a few months ago).
Maybe some solid CMS will come up from the VT experience and Apple will offer it with the next generation Xserve.
I want my own cluster!
It's unlikely taxpayers payed for any of this.
dude!! you're black!! have you thought about joining the GNAA? (see near the top of this article)
You win the "I went a long distance to find a reason to blast Apple" award!"
People can't blast them for overpriced machines, low performance, monokernels, having an unstable OS with no real preemptive multi-tasking, and no protected memory. So let's blast them for not giving buyers discounts! Whoo!
It's curious to me that large instruction words are considered inefficient when applied to EPIC but are efficient when applied to RISC. Funny how everyone complains about how horrible it is to be saddled with an instruction set (x86) that's designed to be compact yet believe EPIC to be the essence of inefficiency.
How long did it take for x86 compilers to generate good code? If we go be gcc we're still waiting.
By scaling, the original author is correct. You simply don't understand his use of the word "scaling". He's referring to the ability of future processors to achieve higher performance through the same instruction set, not how many processors you can add to an MP box.
RISC processors worked like crap for a long time, too. The original IBM RT didn't set the world on fire. Neither did SPARC. The current Power architecture is hardly a "reduced instruction set" by the original concept. EPIC needs time to mature.
"Itanium is a poor architecture. This isn't just my opinion, it's the opinion of the professor here at UT Austin working on the multi-core lightweight processor"
Your professor's opinion is... well... flawed.
Itanium is an excellent architecture. Its flaws come from politics:
An excellent architecture has no faults. Clearly, the Alpha architectuer would b considered The Excellent Architectuer(TM) as it out-performs the Itanium2! Go check the benchmarks for a 21264CB Alpha!
1: Itanium requires good compilers. For now, that means compilers from Intel. GCC will be fine for running Mozilla on an Itanium, but technical apps simply won't perform anywhere near the performance of the machine when compied with GCC.
It appears Itanium is in a chicken-before-the-egg issue: Hello Mr. Anderson, what good is a CPU's outstanding performance...when...there...exists...no...outstan ding...compiler? The Itanium arch has been available for 3 years and there has not yet been a Good Compiler(TM) for it. Here is Itanium2, an update of the Itanium architecture, and there is not Good Compiler(TM) in sight. I have more confidence in buying swampland and praying to God for a drought to dry it all up. Better yet, I hear there is some HOT land for sale in California that has potential; a smoking deal, just a few issues of supply and demand of fire-fighters just-in-case...
2: Intel wants to market Itanium as a server chip. That means that they are putting 3MB or 6MB on the high end Itaniums. Soon they will have a 9MB cache version. Lots of cache means lots of transistors means lots of heat.
There is no spoo^H^H^H^Hserver chip. Yesterday's dedicated servers are today's 1337 workstations.
3: Intel is not fabbing Itanium with a state of the art process. Intel leads the world in process technology, yet their Itanium is still on a 130nm process. Before Madison (about a year ago), it was on a 180nm process.
Yea, ok Mohammed...
Some misconceptions:
1: Itanium is "inefficent". This couldn't be further from the truth. At 1.5Ghz, it whoops *anything* else in SPECfp (by a margin of 1.5x or more) and matches the 3.2Ghz P4 or 2.2Ghz Opteron in SPECint.
Itanium2 is latest technology and has already been whooped by the Alpha CPU. Sure, it's arguabl on the Itanium2's actualy performance when the COMPILER can't put all that Performance on the pavement. From an architecture that didn't require a compiler written in the future to be taken forward in the past using a mod'd Delorian; AlphaLinux.org, providing a link to a Heis.de article with a benchmark between Itanium2 and Alpha. Itanium2 is inferior to 2-year-old Alpha technology, and so is PowerPC4.
2: Itanium is "slow". Wrong again, see above.
Somewhere in Jerusalem, a Yeti is jumping on his desk flinging his poop at Developers(TM) and shouting: "Compilers, compilers, compilers, compilers!"
3: Itanium doesn't scale. Wrong again. Itanium scales better than any other current architecture, getting nearly 100% of clock in both int and fp. Opteron gets around 99% int and 95% fp. Pentium 4 gets around 85% int and 80% fp. I don't have data for PPC970.
Shit! Flying Shit! In Air! "Compilers, compilers, compilers, compilers!"
4: Itanium is expensive. This is true, but it has to do with politics rather than architecture. Itanium uses *fewer* transistors and does *more* instructions per clock than a RISC architecture. Itanium takes much of the logic out of the CPU and puts it into the compiler (this is why you need good compilers). Itanium's architecture is called EPIC, or explicitly paralell instruction computing, because each instruction is "tagged" by the compiler to tell the CPU what instructions can and cannot be executed in paralell.
I hate to have fed this troll. I'm a dope ped
and it make 2 Squared => 1.414 something
...Jobs' dick in his mouf!
Blar.
A muttering overheard in the VT Computing Center (where the Big Mac is): "... and we can fit five X-serves in the space of three towers..."
Looks like they've got a 2 or 3U X-serve ready to go...
To solve the problem of heat in the cluster room, they just bought 5500 of this
This boils down to the BLAS libraries. The core routine--matrix multiply (GEMM)--was optimized by Kazushige Goto. The current impressive benchmark results are due to a mix of Goto's libraries and Apple's veclib framework.
;)
... to all the posters who claimed that Apple was making up the numbers in the G5 performance shootout a few months back? Now we have b&w statements (by a non-Mac fanatic, yet) that the Intel processors were too expensive and slow compared to the G5s. Interesting.
Ars has updated R-max results: http://www.bayarea.net/~kins/AboutMe/TOP500_list_f or_CPU.html
A quote of interest:
"Along with topping 10-TFlops R-max, the Virginia Tech cluster has now topped 2-MFlops/dollar, which shows that using an Apple G5 dual in clusters gives you four times the Flops per dollar as a 2.4-GHz Xeon
and 2-1/2 times the Flops per dollar as a 1.4 GHz Opteron. And 6 times the Flops per dollar as the Madison (Itanium 2). Just like Varadarajan said, "Itaniums are too expensive, Opterons are too weak."
Here are some other great statistics for Mac folk:
In clusters, as indicated by Linpack (this caveat is always assumed), Mac 2-GHz G5s beat all other chips in the critical score of GFlops/cpu at the chip's top frequency (exceptions: NEC's Earth chip and Cray's X-1). So the G5 beats (in this order):
Itanium 2 (Madison),
Xeon,
Power4,
Sparc,
Alpha,
Opteron,
Power3." ...Current results:
http://www.bayarea.net/~kins/AboutMe/TOP500_list_f or_CPU.html
Great Job, VT!!
Source: http://www.netlib.org/benchmark/performance.pdf
As stated.