Big Mac Benchmark Drops to 7.4 TFlops
coolmacdude writes "Well it seems that the early estimates were a bit overzealous. According to preliminary test results (in postscript format) on the full range of CPUs at Virginia Tech, the Rmax score on Linpack comes in at around 7.4 TFlops. This puts it at number four on the Top 500 List. It also represents an efficiency of about 44 percent, down from the previous result of 80 achieved on a subset of the computers. Perhaps in light of this, apparantly VT is now planning to devote an additional two months to improve the stability and efficiency of the system before any research can begin. While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer. In addition, the project was successful at meeting VT's goal of developing an inexpensive top 5 machine. The results have also been posted at Ars Technica's openforum."
I've always been sort of intrigued by
Quod scripsi, scripsi.
Way to go /. -- updated the logo from G4 to G5 just in time.
--
First, from a an Oct 22 New York Times story:
Officials at the school said that they were still finalizing their results and that the final speed number might be significantly higher.
This will likely be the case.
Second, they're only 0.224 Tflops away from the only Intel-based cluster above it. So saying "all the Intel machines" in the story is kind of inaccurate, as if there are all kinds of Intel-based clusters that will still be faster; there is only one Intel-based cluster above it, and with only preliminary numbers for the Virgina Tech cluster at that.
Third, this figure is with around 2112 processors, not the full 2200 processors. With all 1100 nodes, even with no efficiency gain, it will be number 3, as-is.
Finally, this is the a cluster of several firsts:
First major cluster with PowerPC 970
First major cluster with Apple hardware
First major cluster with Infiniband
First major cluster with Mac OS X (Yes, it is running Mac OS X 10.2.7, NOT Linux or Panther [yet])
Linux on Intel has been at this for years. This cluster was assembled in 3 months. There is no reason for the Virginia Tech cluster to remain at ~40% efficiency. It is more than reasonable to expect higher than 50%.
It's still destined for number 3, and its performance will likely even climb for the next Top 500 list as the cluster is optimized. The final results will not be officially announced until a session on November 18 at Supercomputing 2003.
What they're not telling you is that the real reason they are building a supercomputer is because the only copy of the router passwords is GPG-encrypted, and they lost the key.
That 80% efficiency simply sounded too good to be true, and it was.
Now its at 44%. Thats not a small drop, thats a MASSIVE drop.
They didnt predict any loss in going from a small subset to the whole system? Or was it a publicity stunt (we can outperform everyone! our names are __________!)
[I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
That's nothing, last time I benchmarked my Big Mac Cluster (100 Big Macs) it came to almost 57.6 megacalories. Those Apples will never be able to match that!
I read the internet for the articles.
Not terribly surprising. Much like estimated death tolls for disasters, never believe the first set of benchmarks for a computer. Wait until thorough testing can be done before you start believing the numbers.
;)
Y'all should know this by now.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
"best price performance" and "Apple" in their minds?
"Virginia Tech: Home of the Poor Man's Supercomputer and Michael Vick."
Could I have frys with that?
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
Apparently there are a lot of cases where a MULTIPLY and an ADD do come together like that, but I'm not surprised if LINPACK doesn't consist entirely of those pairs. ;)
The 17.6 TFLOP theoretical peak assumed a perfect case consisting entirely of MULTIPLY-ADD pairs. In a case assuming no MULTIPLY-ADD pairs, the theoretical peak is 8.8 TFLOPs.
7.4 TFLOPs is only 42% of 17.6 TFLOPs, but it's 84% of 8.8 TFLOPs. I suspect the actual "efficiency" of the machine lies somewhere in the middle.
(As for me, I'm happy with just ONE dualie...)
Thanks for asking!!
If you look at the difference #3 and #4 on the list, you see it is quite small. The G5 should be capable of more than a little better performance than a Xeon (which is the #3 cluster) in floating point. Some good tweaking will increase the efficiency this preliminary number is just over 40% theoretical, which largely points to a lack of optimization for that cluster.
No trees were harmed in the composition of this; however, numerous electrons were inconvenienced.
build a 7.4 TFlops computer for 500 pounds? Please?
I have over 70 freaks, do you?
I'm a big Mac fan. But I have to say that this kind of project, whether it involves Macs or Intel machines, is just too cool.
:)
This is what it's all about for me. Seeing what you can do to push the envelope AND in the mean time getting something useful out of the whole endevour.
Cost efficient super-computers is a worthy goal. Earth quake prediction, simulated stress on a skyscaper, launch/re-entry calculations for spacecraft, etc. There is no end to all the good that can come from this sort of pioneering spirit.
Sorry! I get kind of goofy when I see cool stories like this.
-- What's this '-r *' file doing here? -- Oh well, a simple 'rm' should do the trick.
Ha Ha!
Not all Macs, just clusters of G5s. The thing's experimental, you know.
Besides, I wouldn't complain about my computer outputting only 7.4 TFlops.
Maybe we deserve this world ?
While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer.
It still bests all other Intel hardware with only the Alpha hardware on top. And given the CPU count, even the Alpha hardware does not match it. Look at the numbers.....The Linux based 2.4Ghz cluster has almost 200 more CPU's on board with a 217 Gflop/sec difference. The Alpha clusters are running anywhere from 1,984 to 6,048 more CPU's.
Visit Jonesblog and say hello.
See http://www.netlib.org/benchmark/performance.pdf page 53.
Since yesterday's release at 7.41 Tflop, the G5 cluster has already increased almost a Tflop, and is now ahead of the current #3 MCR Linux cluster, and about 0.5 Tflop behind a new Itanium 2 cluster.
/Watched WarGames too many times as a kid.
please.
If someone used off-the-shelf machines that my company made, and got even into the top-10, you can bet your bottom dollar that the next thing in my job-pile would be a "make an announcement that we're in the top-10 fastest computers in the world."
This is fantastic, no matter what way you cut it! Using commodity components, these folk have turned the G5 into a real champion. No longer do budgets have to be in the hundreds, or even tens of millions to get a top-notch supercomputer. And this is not even the end, at the rate things are going, I would highly suspect that IBM is considering the G5 for one of it's own supercomputer projects, so hope is not lost yet. Imaging an IBM supercomputer, for under $1 million! Beat up your favorite chess champion and still afford the mansion in the Bahamas. 8)
Karma Whoring for Fun and Profit.
First an article talking about a Linux PDA, then an article talking about a Windows Mobile 2003 SmartPhone.
Then a typical Apple-lover's article about the new PowerBook, now one that will surely break the hearts of all the Dell haters from last week.
What the hell is going on today?!?
First you have the iTunes store which doesn't do anything but give the average user basically anything he or she might have wanted to have in on online music store. Despite its being free, we're all cheesed off that it doesn't support OGG, or it's meant partly to push iPods (duh), or whatever.
Now this -- a supercomputer that has, to quote that again, the "best price/performance ratio ever achieved on a supercomputer." But dang it all, it doesn't completely blow away every established precedent -- it's just in the top five on the usual list of comparisons. One more crushing disappointment.
From Microsoft, we just want products that don't completely ream us. From Apple, we want the entire world to seem a little friendlier and cooler with every product release, every dot-incremenent OS update. They both disappoint us, but the expectations seem a little different...
"Fundamentalism" isn't about divine morality. It's about human authority.
So, yes, these numbers are preliminary, and yes, they WILL increase - they already are. See http://www.netlib.org/benchmark/performance.pdf (the official source of preliminary numbers), page 53.
Thhbbt!
Do these benchmark results take into account that software they have to run to check for memory errors?
The preliminary performance report at http://www.netlib.org/benchmark/performance.pdf contains the new entries for the upcoming list as well (see page 53).
How long does it typically take to tune a supercomputer cluster? The order of "months" seems pretty good. I would expect "years" on a typical system of this scale.
It took me months to get Windows tuned and stable on a single PC. (And the jury's still out on that one)
of all of these so-called "benchmark" discussions. Everyone really knows, in their heart of hearts, that the only valid benchmark is to be found in real-world applications such as Quake III. I want to know how many fps this alleged "supercomputer" gets.
144l. ph34r my 133t l3g4l 5k1lz!
Someone double-check the facts to make sure they're not lying again.
Anyone know how much merit there is to using Nmax (or N1/2) to compare different systems?
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
Yes, but doesn't Moore's Law and the commodification of computer hardware suggest that each new generation supercomputer will have the best price/performance ratio?
Efficiency is strongly dependent on the interconnect. Does anyone know if the 128 node benchmark (that supposedly showed ~80% efficiency) was run with only one Infiniband switch -- i.e. all nodes connected through only one switch?
BTW, the performance never was stated to be 17 TF, so it did not drop to 7.4 (or whatever it ends up to be).
While I am amazed at the initial price vs preformance that this cluster of macs have obtained I am worried about the eventual cost all the electricity and cooling will be for the cluster. I remeber reading in some random article that the electricity used to cool and power the computer was extimated around 3,000 midrange homes. Just from a quick calculation of homes x $100 x 12 months we get the horrible figure of 3.6mil. So over a 10 year lifespan of the cluster it will cost 36mil more the the current price.
While it is still cheaper then the original cost of Intell or IBM super computers I personaly would rather spend more and waste alot less electricity, since if I remeber correctly the cost of engery for comparable super computers was in the range of 0.5 mil-1 mil. Although they are stationed in other countries so the cost of electricity could be dramaticly less in japan then in america but I doubt it. Someone should really get the kW per hour used by the top 5 super computers and then calculate the price per year based on that.
Never could figure out why my girl liked my bitch tits, then I found out she was a lesbian.
I wonder how cheap an Intel cluster would become using discounted Pentium 4's or even the celerons.
I installed a button on the front of my cluster
to manually clock the CPU's.
So far i've managed ONE whole flop.
My record is for the slowest supercomputer
on the planet.
siggy played guitar
Because nutritionists deal with Kcals, but call them cals. So they're off a matter of 1000x when talking to thermodynamacists.
The Story is also here at News.com.com
Great ideas often receive violent opposition from mediocre minds. - Albert Einstein
What's makes a "commodity" component? Or "off-the-shelf" component?
The component cost of machines built out of "commodity" equipment still is much, much too high to warrant all the attention these machines have been getting.
I'd like to see hot machines and clusters built out of something I could afford to buy on a couple month's wages. What everyone's paying attention to costs more money than I'll probably hold over five years.
tasks(723) drafts(105) languages(484) examples(29106)
What are you smoking? The efficiencies of the top 5 supercomputers (R_max/R_peak) are: .87, .67, .69, .59, .73 -- I got bored after that...
An efficiency of 44% is about the *worst* I've ever heard.
Read this: best price/performance ratio ever achieved on a supercomputer.
You wanker!
Awesome! Once again Apple demonstrates that true technological innovation does not come from hobbyists (like Linux) or monopolists (like Windows), but from thinking OUTSIDE the box. Apple is a company to be feared, but not because it will use monopoly power or swarms of zealots to do its dirty work, no, but because Apple has superior designs, superior attitude and superior thinking, just like the people who use Mac's! Chalk up another win for thinking different! Yeah baby!
No troll here.
The #3 cluster *IS* basically 2000 P4's and it's 7.6Tflops.
And, 0.224 *IS* insigificant, especially because of the fact that today's Netlib performance paper ALREADY shows Big Mac at 8.16 Tflops. It increase almost a full teraflop in ONE DAY just from performance tweaking. So shut the fuck up.
Jack Dongarra says that a "supercomputer" is simply a computer that, for todays's standards, is REALLY fast. I saw one presentation from him, and he said he run the Linpack benchmark on his notebook (2.4 GHz Pentium 4) and it would get to the bottom of the Top500 list in 1992. So, this supercomputer definition is very fluid.
I know if you think about it, that Macs are PCs too, but I think Joe Public hears "PC", and they think Wintel.
Toon toon! Black and white army!
I wonder how much faster a true vector processing computer would do against the hybrid g5/altivec cluster ?
Using vector processors for weather forecasting simulations is the major strength of that type of computing.
But I don't think anyone makes a vector processor anymore except for NEC or maybe cray.
The Linpack benchmark, as compiled to the G5, is not utilizing the processor to its fullest. The school is still in the process of adding Altivec compiler optimizations, which should drastically improve the results.
Right now, the processor is behaving essentially as a G4 with a bigger fan and more memory addresses. Rumor has it that tweeking the compiler to abuse the Altivec unit may push the system above the theoretical limit in some calculations.
If this cluster was MAC and anywhere near the size/cost of other clusters it would easily be number 1, assuming of course they do workout those efficiency problems.
~ now you know
Check out the Englight256... Coming soon to a military installation near you...
I'm curious if there are any AMD clusters out there, and what they bench at.
Another thing is that with cluster based supercomputers we could be at it forever: "Ooh, U of Poeduk took our #2 spot by 20 gigaflops.. You! The Intern, take some petty cash and buy us a few more cluster machines and some network cable!"
Remember them? Manufacturer of the highest performance x86 processors available? An array of dual-Opteron systems could be built with dramatically lower price/performance ratio than any other platform, especially G5s or Intel Xeons.
Does anyone know if Linpack was optimized for PPC hardware, specifically the 64-bit G5 with all its bells and whistles? That makes quite a difference.
(STEVE JOBS IS COOL!)
That 80% efficiency simply sounded too good to be true, and it was.
Now its at 44%. Thats not a small drop, thats a MASSIVE drop.
They didnt predict any loss in going from a small subset to the whole system? Or was it a publicity stunt (we can outperform everyone! our names are __________!)
--
[I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
How can people talk about price vs. performance when the machines were discounted more than 25%! (read at bottom)
Is price/performance a factor or something to boast about when a company is drastically reducing price just to get thair name in the news?
I mean, I gave my brother a dual 466 celeron last month, that's better price/performance than this super computing cluster.. bleh..
Noted. And go VT, go Apple! Now, with the cheerleading out of the way, I wonder something - with Moore's law and all still applying pretty well, just getting the latest-and-greatest any home computer architecture will all but guarantee you pretty good price/performance.
As another poster pointed out, someone's recent laptop could do as well on Linpack as a 1992 supercomputer.
So what I think would be interesting would be a kind of adjustment for Moore's law, sort of like how prices are adjusted for inflation when comparing, say, the cost of building the Empire State Building with the cost of building the World Trade Center.
Any economists out there with any good ideas?
They're using 10Gbps Infiniband.
Interesting link describing the processor architecture and application performance in modern supercomputers.
Good read for anyone interested in some of the background in current super computers and what they used for testing.
Heres the link.
TruePunk | Games
Somebody had to say it.
"Would you, could you, with a goat?" Dr Seuss
The degree of loss is interesting, and suggests that their algorithm for distributing work needs tightening up on the high-end. Nonetheless, none of these are bad figures. When this story first broke, you'll recall the quote from the top500 list maintainer who pointed out that very few machines had high performance ratings, when they got into the large numbers of nodes.
I'd say these are extremely credible results, well worth the project team congratulating themselves. If the team could open-source the distribution algorithms, it would be interesting to take a look. I'm sure plenty of Mosix and BProc fans would love to know how to ramp the scaling up.
(The problem of scaling is why jokes about making a Beowulf cluster of these would be just dumb. At the rate at which performance is lost, two Big Macs linked in a cluster would run slower than a single Big Mac. A large cluster would run slower than any of the nodes within it. Such is the Curse that Amdahl inflicted upon the superscaler world.)
The problem of producing superscalar architectures is non-trivial. It's also NP-complete, which means there isn't a single solution which will fit all situations, or even a way to trivially derive a solution for any given situation. You've got to make an educated guess, see what happens, and then make a better informed educated guess. Repeat until bored, funding is cut, the world ends, or you reach a result you like.
This is why it's so valuable to know how this team managed such a good performance in their first test. Knowing how to build high-performing clusters is extremely valuable. I think it not unreasonable to say that 99% of the money in supercomputing goes into researching how to squeeze a bit more speed out of reconfiguring. It's cheaper to do a bit of rewiring than to build a complete machine, so it's a lot more attractive.
On the flip-side, if superscaling ever becomes something mere mortals can actively make use of, understand, and refine, we can expect to see vastly superior - and cheaper - SMP technology, vastly more powerful PCs, and a continuation of the erosion of the differences between micros, minis, mainframes and supercomputers.
It will also make packing the car easier. (* This is actually a related NP-complete problem. If you can "solve" one, you can solve the other.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Most responses in here are about how the G5 should be performing better, or should have better numbers than the Xenon or Sparc, or whatever.
What seems to be missing from most of the conversation is that it's not the Mac's that are loosing efficiency per se, it's the network (the interconnects) that is slowing the machine as a whole down. I know little about the LinPac test, but I would assume that it's written to test/stress the entire machine: CPU, disk, memory and interconnects. If the Macs can finish parts of a problem really fast, but can't get new data in to the nodes fast enough, that will casue a tremendous loss in effieciency.
Perhaps they need a mechanism for buffering new data on the nodes so that incoming and outgoing data can stream as the network is available and keep the CPUs working all the time.
Article X: The powers not delegated... by the Constitution...are reserved...to the people
Then you are a fucking idiot.
So, in all these "maximum speed tests", what is being used, 32 bit reals or 64 bit reals? The difference is that in solving large non-linear systems, the higher precision numbers result in a faster solution, but operations involving doubles will resulting a lower gflops measurement with benchmarks (although a solution may in fact take 10x less iterations).
Somehow the use of "Mac" and "overzealous" in the same post fails to surprise me. (It's okay, you don't need to flame me, it's just a joke.)
This side up.
I think that magazine article must be wrong. If 1100 Macs use as much power as 3000 homes, then each mac is using about 3 houses worth of power. That seems excessive unless the home is in a 3rd world country or those 9 fans are really really running full blast. More likely, each G5 (with networking and cooling equipment) uses a few hundred watts. Even at 500 W/Mac, 1100 Macs, $0.15/kWH, 24 Hr/day, 365 day/year the cluster costs about $722,700/year. More likely, each Mac probably only consumes an average of 300 W max and is not running full tilt 24x7, so the cost is maybe around $300-$400k/year.
But your point is a good one. I often wonder about the environmental economics of people running SETI, Folding@Home, etc. on older machines. Most of those older "spare" CPU-cycles are quite costly in terms of electricity relative to newer faster machines that do an order of magnitude more computing with the same amount of electricity.
Two wrongs don't make a right, but three lefts do.
Communication latencies are definitely going to be their major problem, it always is in a super-computer cluster. However, that being said, I'm not sure that they've got the bandwidth problem licked either. Yes,infiniband provides tons of bandwidth, but I'm not sure that the system has the internal bandwidth to make use of it.
They are connecting the Inifiband cards using 64-bit PCI-X, which allows for up to 2.1GB/s of bandwidth. However, the PCI-X controller is only connected to the memory and processor controller chip through an 800MB/s hypertransport link (err,two links, 800MB/s in either direction). This is actually less than the 10Gb/s (1.25GB/s) of bandwidth that infiniband provides. What's even worse, that 800MB/s hypertransport link is shared with basically all I/O other than memory and graphics, and it's also a peak figure, with real-world bandwidth being somewhat lower.
In short, internal system bandwidth could be a problem here. Likely they are also running into some internal system latency issues as well. To get to their infiniband cards, they first have to go from the processor though the Elastic I/O channel to the memory+processor controller chip, than through Hypertransport to the PCI-X, than over the PCI-X bus to the Infiniband card. Each step of the way is adding a bit of latency.
They would have been much better off with something like how the Opteron handles things, with the PCI-X controller hanging right off the processor itself, connected via a 3.2GB/s (each direction) hypertransport link. More bandwidth and fewer chips and buses to go through.
Actually, the numbers are being updated daily. VT was at 7.41 Tflops yesterday, and they're at 8.61 Tflops today. They will continue to improve and tweak right up until the conference.
My feeling is that the ~40% efficiency seen on the larger scale run is an indication that either VA Tech spent very little time tuning the problem size or they didn't design their InfiniBand fabric to really handle 1100 nodes hammering away at Parallel Linpack. (Given that they've been extremely vague about how their IB network is structured, I fear it may be the latter.)
I doubt that's true, especially if they're using the IBM PPC compilers. The G4 has both significantly less memory bandwidth and a single double-precision-capable FPU, whereas the G5 is basically a single-core Power4 with an AltiVec unit in place of some cache. IBM's compilers (despite being a little wonky as far as naming and argument syntax) generally produce pretty fast code."My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
They WILL be counted, and they'll be counted right up til the conference. The deadline to be considered was Oct 1, and you had to provide a good deal of information about your installation - EXCEPT performance. Those who made the call for papers will be included, but they can keep improving right up until late November. Sorry to disappoint, but Virginia Tech's cluster will be a lot higher.
(Disclaimer: I'm a Mac user).
I still think #4 in the world is pretty damn impressive for Apple hardware! And it looks like there might be some small performance improvements to come.
I think everyone involved did a pretty damn good job! Have a beer on me.
-psy
(hides/ducks - I ain't an anonymous coward for nothing!)
(LINUX RULEZ!!!)
Yes, Slashdot's performance is sucking ass. Man do they suck. They are the suckiest bunch of sucks that ever sucked.
500 errors occasionally. And EVERY time I do a preview or post, I have to resubmit 5 or 6 times just to get the comment to post. WTF? Get your fucking act together Slashdot.
Now, to resubmit 5 times...
oh, btw, LINUX IS THE ONE TRUE OS.
Mod this Father Pudge Oday wannabe down into oblivion.
-You may license this sig for only $6.99.
therefore you would gain 300 - 11.55 = 288.45 Calories of energy from a 4 degrees C, 350mL can of Coke.
:-o. Otherwise, I think contact with the surrounding air (and external condensation effects) will affect the can's warming in addition to your own effects.
Interesting math. If it's correct, I think it would work only in a very special situations, like if you entirely surround the can with your body, perhaps inside a big roll of flab, or maybe inside a body orafice
It should be noted that every Mac cultist ignores AMD. I'm sorry, but AMD has the most price effective clusters bar none. Do a little research before you start flapping about the Mac guys. Don't Mac users know what Google is, or are they all stuck using AOL search? Athlon MP, Opteron. Hello. Why build a more expensive and slower cluster on the G5 when you can have it faster, cheaper, and far more scalable with Opteron? It does not make sense. If I was a VT alumni I'd be calling for heads to roll.
ignorance is bliss. googlefiberatx.com
Efficiency of a parallel computer considered to be
E=Ts/(n*Tp)
where Ts is the time to perform the computations serially, Tp is the the total time to perform the computations on the parallel machine and n is the number of parallel processing units.
It wouldn't take much to get a drastic improvement in efficiency simply by improving the time slightly for each parallel processer, especially for 1100 nodes.
I don't know how the benchmark program runs, but improving the communication time would imrove the efficiency as well.
It shouldn't take much to boost this by a few million flops.
The 21st version of this list does not
show the SETI@Home project. The top entry
is NEC at 35 terraflops. Today's SETI@Home
average for the last 24 hours is 61 terraflops.
It may be a virtual supercomputer, but it
is producing real results.
-- Stephen.
Oh poor baby wa-baby, I must have PO'ed a Mac user to get modded as Troll.
This is a test. This is a test of the emergency sig system. This has been only a test.
how much did they spend on this mofo? i wanna be friends with VT. they throw cash around like it aint no thang. "Hey, VT. Can you spot me a couple million green? I wanna get a little supercluster." "Sure, those things are really inexpensive."
Do you really think somebody is going to spend that kind of money on a supercomputer and NOT run it full-tilt 24x7? It's not like they're concerned about paying it overtime!
TFLOPS = Trillion FLoating-point Operations Per Second.
TFLOPS is similar to MIPS (Million Instructions Per Second) and the trailing 'S' is always required.
Although it's accepted to use "Flops" (initial cap only), it's never correct to say "one Flop" unless you're talking about something like Gigli or the Segway.
They shoulda used this.
"Sic Semper Tyrannosaurus Rex."
your real problem is that if your body is 37 degrees, youre dead.
also, that 4 degree coke isnt gonna go down too smoothly being frozen
yes, its a joke
i sell illegal drugs
62.5% (6 Gflops out of 9.6 Gflops peak)
BTW, why is it that Apple marketing claims the G5 is more powerful than a Pentium4? Both can do two floating point add-multiply ops per clock.
Before they flame me, I must say that I know that there are a lot of other issues, like cache size and speed, bus speed, etc, that influence performance. But when it comes to peak speed, it boils down to how many of those add-multiply pairs you can do in a second. I'm curious to know how efficient is a single G5 mac, since network also plays a role in getting that efficiency figure. I do a lot of numerical processing, using ATLAS-optimized LAPACK, so I don't care if Photoshop or Excel is faster in one or the other computer. The figure I'm really interested in is that add-multiply speed.
A person might also want to consider that Jaguar does not take full advantage of the processor. In fact, neither does Panther.
I'm not a techwiz, but wouldn't full 64-bit memory addressing affect the Tflops, and ultimately, the cluster's performance?
Yeah, I remember. Windows is for Intelligent People.
Everybody know, Intelligent People don't have viruses and never have driver problems. They can plug in anything they like and it always works - Even After The Next Security Update.
So yes, the guy must be an idiot. I bet he expected his computer to just work. Hahahaha! What an idiot.
I think, therefore I am...I think.
Has anyone else noticed that if you spec out a Dual 2 GHz G5 with the specified amount of RAM (4GB) and hard drive space (160 GB) and don't even add a display (only one display would be needed on the head node nothing more). The price jumps to $5,120.00 (with no other extras and getting rid of the modem and Super Drive). Now, multiply that number by 1100 computers. $5,632,000 or $5.6 million. Now correct me if I'm wrong but the quoted price of the whole shebang is $5.2 million and that includes all the additional hardware. So someone's math is off. I don't deny that they could get a discount from Apple on some of the stuff but that would still mean that someone donated all the other equipment. So really and truly the cost of this computer should be much higher (since everyone is doing a price/TFlop comparision this is important) or it should be disclosed exactly how they're doing their math (maybe they used an old Pentium with the error in it's table) but I certainly hope they didn't use their new super computer to calculate that price!
This puts it at number four on the Top 500 List.
It does? As far as I can tell, VT hasn't actually built this cluster, they are simply extrapolating from a smaller cluster.
While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer
I can find no evidence of that in what has been published so far. Maybe someone could actually support such claims with facts?
When I look at the actual prices I would end up paying for a dual G5 vs. a dual Opteron or dual Xeon machine (as opposed to some special deal Apple has made for PR purposes), the G5 comes in worse in terms of bang for the buck. And the fact that Apple hasn't managed to produce 1U dual G5 rack mounts (apparently because the chips run too hot) adds even further to the cost of deploying them as clusters.
I'm looking for cheap compute cluster hardware, so I'm all open for rational, careful calculations and analysis. But this kind of hype over the G5's is simply off-putting.
At $5.2 million for 7.4 teraflops, we get $.74324 million per teraflop. UTexas cost $3 million for 3.7 teraflops, we get $.81081 per teraflop. And given that the VaTech cluster isn't well optimized yet (and the UTexas one likely is -- Linux systems have been doing this for a while now), this gap is likely to increase by 25% or more.
Here's the deal. As clusters grow in size, nearly always their performance per dollar DROPS. This is primarily because the interconnect usually grows as a square of the size or worse. So we'd expect that with twice as many machines, the VaTech cluster to have lower performance per dollar than the U Texas cluster. But it's the other way around.
Looks to me that U Texas got screwed by Dell.
jeeeeeses!
don't forget to browse the websites mentionned in the top list... interesting for all geeks out there
La paresse est l'habitude prise de se reposer avant la fatigue
Target the G5 at the consumer market. Target the G5-W at the engineering/high-performance market.
Amazingly, thanks to Apple, the PowerPC architecture has the best chance of capturing a sizeable share of the workstation market, obliterating any remaining UltraSPARC workstations. Apple has a damned good chance if only Steve Jobs doesn't blow it.
They're now in front of the LLNL MCR Linux cluster (which they were previously behind), but a new HP Itanium 2 cluster, which has appeared since the report was last updated, is now slightly ahead of them. There shouldn't be many more new entries showing up in the top 10 this list (most of the big guns like Blue Gene, Red Storm, and ASCI Purple won't be ready until 2004), but we always knew there could be new entries.
Just another reminder... The numbers were updated today.
1) NEC Earth Simulator - 35860
2) ASCI Q - 13880
3) HP RX2600 Itanium 2 (1936 CPUs) - 8633
4) Apple/VT (2112 G5s) - 8164
5) Linux NetworX (2304 Xeons) - 7634
When I said "current #3", I meant the "current #3" on the currently published Top 500 list at top500.org, not the number 3 spot on that report: there is a new player at number 3, which is that Itanium 2 cluster I referred to.
penguin said: "It does? As far as I can tell, VT hasn't actually built this cluster, they are simply extrapolating from a smaller cluster."
Read the info provided. 2112 CPUs are used. (Earlier, they had tested with 256 CPUs.)
By the way, it's now 8.2 Tflops, not 7.4.
perhapse you meant to say 1 iCalorie
...DAMN YOU STEVE JOBS!
I'd have used iCal, but THAT was already taken.
Read this: worst price/performance ratio ever achieved on a supercomputer.
You wanker!
-- The WIPO Avenger
I have been sitting here by my 1100 node G5 cluster trying to copy a 17.6 MB file for the last 20 minutes. It is so freaking slow now that I only get 44% efficiency. On my 1.5 Ghz P3 I would be able to do this in under 20 seconds. .....
a bit overzealous describes every Macintosh user I've ever met!
Best Buy can have you arrested
The US government classifies a G4 Mac as a weapon. The FAA is a US government agency. Can I be arrested for attempting to bring my PowerBook on an airplane?
Sheesh...
You people piss me off. Every Mac and self-respecting PC does a full RAM test each time it boots, and generally if RAM is good it STAYS good for a long while.
Do you even realize that ECC RAM doesn't CORRECT errors, it just shuts the machine down when it detects one? Have you ever heard of ECC RAM 'saving someone's ass'?
ECC is a good way for RAM and server manufacturers to get rich, not much more. Plus I heard a LOT of the ECC RAM out there was Pseudo-ECC, which means it just passes all the right parity bits regardless of data integrity.
I'm sure there's a good degree of redundancy in the calculations they're doing, what would happen if a machine caught fire with data in-flight? There'd have to be some sort of error-checking.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
I'm pointing at you and scrawling 'swarms of zealots' on a t-shirt to hand you. Catch!
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
Is the things you can find out by looking at the whole list.
Like...
The highest rated "classified" computer in the US is only at #44, a Cray with 1900 processors that clocks in at "only" 1166 GFlops. One can assume that it resides at NSA. Does anyone really believe that NSA would be using such a relatively "slow" supercomputer. Piffle. The faster ones are probably so classifed that no one without a very high security clearance even knows they were built.
Avon Products apparently has a supercomputer that can do 277 GFlops (#456 on the list). Just what on God's Green Earth does Avon need with a supercomputer that makes the Top 500? Studying flow patterns in cosmetics? Data mining the Avon Ladies? Kinda makes you wonder, doesn't it?
BMW apparently spends a whole lot of money on HP super computers, with 12 on the list (unless I missed any--#'s 225, 243, 244, 322, 323, 324, 331, 342, 417, 418, 429, and 485), with a combined processing power of 4188.6 GFlops, and that was all installed in the past three years. With all that power, they still couldn't figure out that an embedded Windows OS for their flagship car was a bad idea...maybe they need to kick the F1 team off the supercomputers for a while and let the production car guys in...
That price/performance ratio is probably constantly improving in supercomputing, as it is everywhere else in computing, so it's not as if they're making history in that regard.
Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
Here's a technical link that sums up some of the many ways in which you're mistaken:h Support/inf obits/ram/ram9.html
http://www.genitech.com.au/LIBRARY/Tec
The difference between parity (which is hardly used anymore) and ECC is significant.
Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
Well not literally me, but as a Mac user I do find it deflating -- not that I am not still impressed by the numbers...top 5 and all, best price/performance, etc...but it was fun while it lasted.
Here's to hoping they can optimize the cluster and squeeze out the theoretical performance they hoped they'd get.
We apologise for the fault in this post. Those responsible have been sacked. -- Signed RICHARD M. NIXON
I'm pretty impressed actually. Buying off the shelf parts and getting to the #4 position in the world is a great accomplishment. I'm sure with some tweaking they might get some performance improvements.
The price of the system is amazing. Moreover, many schools now can afford a decent "super computer".
The definition of a supercomputer doesn't have anythign to do with how it actually works..... it has to do with speed.
When we say supercomputer, we mean something that is significantly faster than than the average computer.. the fastest computers at any given time are by definition supercomputers.
fills me with hunger and G5 lust at the same time.
I knew that PC's were generally cheap pieces of crap, but I had no idea they were that worthless...
And hey, fucker, I lost my ass in 'Nam defending you and your snot-nosed faggot friends so you could continue circle jerking to pictures of Marky Mark after felating the family chickens, drink each others urine and cruise the Circle K.
Here's your nickle back...go buy yourself a real life.
Quote:
Read this: worst price/performance ratio ever achieved on a supercomputer.
Read the bloody article!
You blathering illiterate wanker!
... is that I have finally seen slashdot use a G5 icon for a G5 story. This opposed to the up-til-recent use of a G4 icon for G5 stories. Thank you, slashdot editors!
Can you read?
Moron!
It takes about ten years for the power of whatever is considered a "Super Computer" to make it to the desktop.
That means that in 2013 I can have a 7.5 TFlop game machine for about $2,000.
When a can of coke says it contains 300 calories, can the human body actually metabolize that and turn it into 300 calories of heat energy?
I was under the impression chemical energy couldn't be turned into heat energy with 100% efficiency. Of coarse when people state how much energy something contains they might take that into account. Just curious.
The problem of producing superscalar architectures is non-trivial. It's also NP-complete, which means there isn't a single solution which will fit all situations, or even a way to trivially derive a solution for any given situation.
Huh? Arbitrary scheduling of resources on specialist-parallel architectures is NP-Complete. Unless you have a ridiculous number of scalar units, the architecture itself is not NP-Complete.
NP and NP-Complete are two totally different beasts. There is a chance that you will accidentally solve an NP problem the first time through monte-carlo attempts, but simply solving an instance of an NPC randomly does not guarantee solutions to all NPC's.
if superscaling ever becomes something mere mortals can actively make use of,
Huh? Where do you think the perfomance gains have come from in the last decade? Clock speed? Did you miss Intel's Itanium gamble on VLIW architectures? Or maybe you were too busy extolling the miserable scalability of SMP's to notice that the core problem in HPC hasn't been processor performance but rather memory bandwidth and programming architecture.
Your own argument confirms that general-purpose supercomputers are, by definition, horribly inefficient. This is not to say that optimal architectures don't exist. Rather, a fundamental rethinking of how we compute may finally be in order. Fear not the data-driven models for only they, not current control-driven, can properly represent the maximum exploitation of parallelism in algorithms and do such without demanding specific physical architectures to optimally match the logical architecture of the computation.
Moore's Law is a limitation of performance advancement, just like Amdahl's Law, and this is due to the base philosophy of von Neumann architectures. Fundamental differences in real performance will not be achieved with evolutionary steps forward in physical architecture, but revolutionary shots into a much different solution space.
I think this just proves it. Apple have come out with a product that is fast, scalable, USER FRIENDLY, secure and robust. Seriously, how can anybody claim that GNU/Linux is good for the average Joe end user? Oh...but the end user should have to know how to configure a firewall, compile a kernel and be able to write scripts to automate downloading pr0n. And if they dont then theyre just stupid idiots because were just so 1337. Get over yourselves. Mac has come out with a SUPERIOR product. It is where all you zealots dream of taking Linux, except they are THERE. They will conquer on the desktop AND server market.
I am Monkey, the Great Sage, equal of heaven!
'nuff said
KASY0 achieved 187.3 GFLOPS on the 64-bit floating point version of HPL, the same benchmark used on "Big Mac". While "Big Mac" is about 40 times faster on that benchmark, it is about 130 times the cost of KASY0 (~$40K vs ~$5200K). Considering the size difference, "Big Mac" is VERY impressive, but it can't claim to be the best price/performance supercomputer on the HPL benchmark.
Note: KASY0 gets 482.6 GFLOPS (0.48 TFLOPS) on a 32-bit precision version of Linpack, satisfying our under $100 per GFLOPS claim.
Regardless, Virginia Tech's "Big Mac" is a very impressive machine. My congratulations to them!
Tim Mattox
G5's have a PSU rated for 120V, with a max current draw of 7.5A. That's 900W (900VA, really, but they're theoretically the same).
I'd like to find a house that's running on 300W. That's only about 5 incandescent lamp bulbs. Talk about energy efficient...
how do they measure efficiency and what does it mean in thsi context
You're forgetting the AC costs... If you've ever worked in a DC you know that the room itself can get mighty toasty, and toasty air leads to cooked systems.
Each processor, drive, and switch generates heat which is dissipated into the air. Untouched that heat accumulates and will kill the entire thing. With 1100 dual processor nodes running (and you can be they'll each be running at pretty close to full tilt) constantly that's a hell of a lot of heat that needs to be removed from the air.
The G5's memory controller is built into the U3 IC, which is essentially the "north bridge"- it is NOT built into the CPU.
r dware/Developer_Notes/Macintosh_CPUs-G5/PowerMacG5 /2Architecture/chapter_3_section_4.html#//apple_re f/doc/uid/TP30000803/TPXREF108"> developer.apple.com</a>h tml">apple.com</a> (thanks for the link)
p teron-06.html">6.4 GB/s</a>.
It connects to the CPU via the "Apple Processor Interface" NOT via hypertransport. It connects to it's memory controller at 1/2 the CPU speed, unlike Opteron and Athlon 64 which connect to the memory controller at FULL CPU SPEED.
Documentation:
<a href="http://developer.apple.com/documentation/Ha
<a href="http://www.apple.com/powermac/architecture.
From the U3 Northbridge, it uses hypertransport to connect to the other peripherials at 3.2GB/s.
Opteron supports a hypertransport rate of <a href="http://www6.tomshardware.com/cpu/20030422/o
The Opteron 4xx and 8xx models also happen to have THREE of these hypertransport channels connected in a cross-bar configuration, giving EACH CPU a dedicated 6.4GB/s connection, rather than the G5 architecture which much share that connection (since there is only one U3 chip in a dually G5).
Support for PCI-X in the G5 by standard is a great thing. I wish more AMD systems contained it... I appreciate their native support of firewire and gigabit eithernet. But seriously... do you really want to argue architecture against a workstation class CPU? I'm a bit dissapointed by the Athlon 64, but the Athlon 64 FX (desktop version of Opteron) and Opteron lives up to most of my expectations and I expect to see more speeds out in the near future.
Stewey
There are 10 kinds of people in the world. Those who understand binary and those who don't.
The G5's memory controller is built into the U3 IC, which is essentially the "north bridge"- it is NOT built into the CPU.
r dware/Developer_Notes/Macintosh_CPUs-G5/PowerMacG5 /2Architecture/chapter_3_section_4.html#//apple_re f/doc/uid/TP30000803/TPXREF108"> developer.apple.com</a>h tml">apple.com</a> (thanks for the link)
p teron-06.html">6.4 GB/s</a> directly from the CPU.
It connects to the CPU via the "Apple Processor Interface" NOT via hypertransport. It connects to it's memory controller at 1/2 the CPU speed, unlike Opteron and Athlon 64 which connect to the memory controller at FULL CPU SPEED.
Documentation:
<a href="http://developer.apple.com/documentation/Ha
<a href="http://www.apple.com/powermac/architecture.
From the U3 Northbridge, G5 uses hypertransport to connect to the other peripherials at 3.2GB/s.
Opteron supports a hypertransport rate of <a href="http://www6.tomshardware.com/cpu/20030422/o
The Opteron 4xx and 8xx models also happen to have THREE of these hypertransport channels connected in a cross-bar configuration for SMP systems, giving EACH CPU a dedicated 6.4GB/s connection, rather than the G5 architecture which much share that connection (since there is only one U3 chip in a dually G5).
Support for PCI-X in the G5 by standard is a great thing. I wish more AMD systems contained it... I appreciate their native support of firewire and gigabit eithernet. But seriously... do you really want to argue architecture against a workstation class CPU? I'm a bit dissapointed by the Athlon 64, but the Athlon 64 FX (desktop version of Opteron) and Opteron lives up to most of my expectations and I expect to see more speeds out in the near future.
Stewey
There are 10 kinds of people in the world. Those who understand binary and those who don't.
The G5's memory controller is built into the U3 IC, which is essentially the "north bridge"- it is NOT built into the CPU.
It connects to the CPU via the "Apple Processor Interface" NOT via hypertransport. It connects to it's memory controller at 1/2 the CPU speed, unlike Opteron and Athlon 64 which connect to the memory controller at FULL CPU SPEED.
Documentation:
developer.apple.com
apple.com (thanks for the link)
From the U3 Northbridge, G5 uses hypertransport to connect to the other peripherials at 3.2GB/s.
Opteron supports a hypertransport rate of 6.4 GB/s directly from the CPU.
The Opteron 4xx and 8xx models also happen to have THREE of these hypertransport channels connected in a cross-bar configuration for SMP systems, giving EACH CPU a dedicated 6.4GB/s connection, rather than the G5 architecture which much share that connection (since there is only one U3 chip in a dually G5).
Support for PCI-X in the G5 by standard is a great thing. I wish more AMD systems contained it... I appreciate their native support of firewire and gigabit eithernet. But seriously... do you really want to argue architecture against a workstation class CPU? I'm a bit dissapointed by the Athlon 64, but the Athlon 64 FX (desktop version of Opteron) and Opteron lives up to most of my expectations and I expect to see more speeds out in the near future.
Stewey
There are 10 kinds of people in the world. Those who understand binary and those who don't.
it's really funny that it's known as "Big Mac"
I think everyone here is forgetting about the Oak Ridge National Laboratories Stone Souper Computer.
http://stonesoup.esd.ornl.gov/
During the creation of this cluster ORNL has spent approximately zero dollars. As they state on their page their price to perfomance ratio is zero and their performance to price ratio is infinity!
LOL
A calorie is the amount of heat required to raise 1 gram of pure water at +4C to +5C under 1 atm pressure.
Gentlemen, you can't fight in here, this is the War Room!
I just can't stand this ever recurring mistakes. Will people never learn? Just like the "here, here!"- and "could care less"-idiots. *sigh*
You should have died in Viet Nam, you asslicker fag!
Are you twelve? You sound like a frothy pre-teen swearing to his detention cronies. I can't believe you have half the sense of a retarded monkey with tourettes. Stupid fucking asshole. No one cares what you think Anonymous Fuck.
Because you are obviously mentally challenged I do not think this will make it past your intellectual moat, but in the hopes that it does, RELAX...While I can admit to being a zealous supporter of my platform of choice, what are you zealous about? Antagonizing others? Surely you do not think you are a beacon of inspiration to other Linux users...Windows users? And trust me, your 'big talk' is a clear indication of how much of a pussy you are in real life. That is what a fuck like you doesn't get through their thick skulls...for all your venomous ranting, you only show yourself for the waste you are. I know it must be lonely in that dark room, alone but for your trusty hand-built box...rest assured it won't be getting any better for the likes of you...you will toil away endlessly in your pathetic life, living in your parents house -- waiting for them to die so that you can squander away what little they leave you behind. Friendless, loveless, childless, just do yourself a favor and end it now...
We apologise for the fault in this post. Those responsible have been sacked. -- Signed RICHARD M. NIXON
AC costs are not that high and where included in the parent-post estimates.
Large air conditioning systems run a Coefficient Of Performance of about 3 -- meaning you only need about 333 W of power to remove 1000 W of heat. The magical thermodynamics of freon mean you can pump the heat from the computer room to the outside without that much cost. So, a 300 W computer means you have a 400 W installation (AC + computer). And in the winter in Viginia, you need only open the window for free, zero-power cooling.;)
Two wrongs don't make a right, but three lefts do.
Although KASY0 may be considered "small" by some standards, it defintly merits the definition of "supercomputer". It's theoretical double precision (64-bit) peak is over half a TFLOPS, and over a full TFLOPS for single precision (32-bit), and it uses about 13 kilowatts of electrical power.
Saying that various single processor commodity electronic gadgets get better price/performance is meaningless. The slashdot subject line was too short to add the word supercomputer earlier... it was implied by context.
But again, not to detract from the VT achievment, Big Mac is very impressive. I anxiously await more details on the fault tolerance software they are using, as well as the network toplogy with the 96-port Infiniband switches.
The high percentage of peak Linpack performance of Big Mac on just a subset of it's nodes tells me more about their network topology than anything else. The nature of the Linpack benchmark is that it scales very very well if each node has enough memory (it's operation count scales as the cube, while it's memory references and communications scale as the square). The fact that its efficiency dropped so much on the full machine to me indicates they have some networking bottleneck between switches. It also means that they have a VERY nicely tuned matrix multiply core for within a single CPU or node. Looking at the typical percentage of peak performance numbers on single CPUs from Automatically Tuned Linear Algebra Software (ATLAS), it's difficult to get over 80% peak on just a single CPU.
So, again, the VT machine is VERY impressive.
Tim Mattox
The cluster is certainly an important milestone, though. The days of being locked in to buying a commercial architecture that was designed three years ago and costs 10x too much are over.
You can never have too much RAM or too much disk space. --Ancient American Proverb, circa 1980
if you have Mathematica 4 and G5,/ /www.scientificweb.de/mathstef3.html
can you download the benchmark notebook at
http://smc.vnet.net/timings40.html
or
http:
and tell us the results?
thanks.
Xah
xahlee.org
http://xahlee.org/PageTwo_dir/more.html
First this, then they lose to WVU, almost completely eliminating all chance of winning a National Championship... It's a black day in Blacksburg.
See http://www.netlib.org/benchmark/performance.pdf page 53. The G5 cluster has moved up to 9.555 TFlops.