Big Mac Benchmark Drops to 7.4 TFlops

← Back to Stories (view on slashdot.org)

Big Mac Benchmark Drops to 7.4 TFlops

Posted by CowboyNeal on Wednesday October 22, 2003 @07:09AM from the number-adjusting dept.

coolmacdude writes "Well it seems that the early estimates were a bit overzealous. According to preliminary test results (in postscript format) on the full range of CPUs at Virginia Tech, the Rmax score on Linpack comes in at around 7.4 TFlops. This puts it at number four on the Top 500 List. It also represents an efficiency of about 44 percent, down from the previous result of 80 achieved on a subset of the computers. Perhaps in light of this, apparantly VT is now planning to devote an additional two months to improve the stability and efficiency of the system before any research can begin. While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer. In addition, the project was successful at meeting VT's goal of developing an inexpensive top 5 machine. The results have also been posted at Ars Technica's openforum."

35 of 417 comments (clear)

A supercomputer by Any Other Name.... by bluethundr · 2003-10-22 07:09 · Score: 4, Interesting

I've always been sort of intrigued by ,a href="http://www.top500.org/">Top500. Has there ever been a good comparison written about the similarities/differences between a 'supercomputer' and the lowly pc sitting on my desk running Linux/XP? At what point does the computer in question earn the title "Super"?

--
Quod scripsi, scripsi.
1. Re:A supercomputer by Any Other Name.... by Carnildo · 2003-10-22 07:24 · Score: 3, Insightful
  
  The big difference is that a "supercomputer" is usually heavily optimized towards vector operations: performing the same operation on many data elements at once. Think of it as SIMD (MMX, SSE, etc), only more so. A "supercomputer" would be pretty useless at ordinary tasks such as web browsing or word processing, as those can't be vectorized or parallelized very well. A "supercomputer" might be good as a graphics or physics engine for gaming, but that's sort of like using a cannon to swat a fly: a lot of work for something that can be done with a simple flyswatter.
  
  --
  "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
2. Re:A supercomputer by Any Other Name.... by BostonPilot · 2003-10-22 07:32 · Score: 4, Funny
  
  Nah, the real defintions is:
  Super computers cost more than 5 million dollars
  Mainframes cost more than 1 million dollars
  Mini-Super computers cost more than 1/4 million dollars
  Everything else is by definition a Plain Jane (TM) computer
  btw, I've worked on all 4 kinds ;-)
snazzy new G5 logo too! by JUSTONEMORELATTE · 2003-10-22 07:10 · Score: 4, Funny

Way to go /. -- updated the logo from G4 to G5 just in time.

--
Important items of note by daveschroeder · 2003-10-22 07:10 · Score: 5, Informative

It's worth noting a few important things:

First, from a an Oct 22 New York Times story:

Officials at the school said that they were still finalizing their results and that the final speed number might be significantly higher.

This will likely be the case.

Second, they're only 0.224 Tflops away from the only Intel-based cluster above it. So saying "all the Intel machines" in the story is kind of inaccurate, as if there are all kinds of Intel-based clusters that will still be faster; there is only one Intel-based cluster above it, and with only preliminary numbers for the Virgina Tech cluster at that.

Third, this figure is with around 2112 processors, not the full 2200 processors. With all 1100 nodes, even with no efficiency gain, it will be number 3, as-is.

Finally, this is the a cluster of several firsts:

First major cluster with PowerPC 970
First major cluster with Apple hardware
First major cluster with Infiniband
First major cluster with Mac OS X (Yes, it is running Mac OS X 10.2.7, NOT Linux or Panther [yet])

Linux on Intel has been at this for years. This cluster was assembled in 3 months. There is no reason for the Virginia Tech cluster to remain at ~40% efficiency. It is more than reasonable to expect higher than 50%.

It's still destined for number 3, and its performance will likely even climb for the next Top 500 list as the cluster is optimized. The final results will not be officially announced until a session on November 18 at Supercomputing 2003.
1. Re:Important items of note by Durinia · 2003-10-22 07:21 · Score: 4, Insightful
  
  On the other side of the issue is that it places 4th in the current Top 500 list, which was released in June. We won't really know where it places on this "moving target" until the next list is released in November.
2. Re:Important items of note by Carnildo · 2003-10-22 07:26 · Score: 5, Informative
  
  The number dropped because they used a better benchmark (testing all the nodes, rather than a subset). It'll probably go up because now they'll be able to tune the system to get around bottlenecks.
  
  --
  "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
the REAL reason to build a top-5 supercomputer by Anonymous Coward · 2003-10-22 07:11 · Score: 5, Funny

What they're not telling you is that the real reason they are building a supercomputer is because the only copy of the router passwords is GPG-encrypted, and they lost the key.
Too good to be true... by mrtroy · 2003-10-22 07:12 · Score: 5, Insightful

That 80% efficiency simply sounded too good to be true, and it was.

Now its at 44%. Thats not a small drop, thats a MASSIVE drop.

They didnt predict any loss in going from a small subset to the whole system? Or was it a publicity stunt (we can outperform everyone! our names are __________!)

--
[I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
Big mac cluster.. by jandrese · 2003-10-22 07:12 · Score: 4, Funny

That's nothing, last time I benchmarked my Big Mac Cluster (100 Big Macs) it came to almost 57.6 megacalories. Those Apples will never be able to match that!

--

I read the internet for the articles.
1. Re:Big mac cluster.. by Frymaster · 2003-10-22 07:33 · Score: 3, Funny
  
  1 Cal (uppercase C) is the amount of heat required to raise the temperature of 1g of water 1 degree celsius
  which brings up a totally off topic question.... a can of coke is 350 ml. it contains 300 calories.
  now, let's say i drink this coke. it is really cold - say 4 degrees. my body temperature is a nice, mamallish 37 degrees. by drinking this coke i am warming up 350 g of what is essentially water from the temperature of the can to that of my body - a difference of 33 degress.
  33c * 350ml = 11550 calories.
  since the coke is only 300ish calories in the first place...
  why don't i lose weight drinking ice cold coke?
  
  --
  2 1337 4 u!
2. Re:Big mac cluster.. by cK-Gunslinger · 2003-10-22 07:41 · Score: 4, Funny
  
  I find your ideas intriguing and would like to subscribe to your newsletter.
3. Re:Big mac cluster.. by zulux · 2003-10-22 07:43 · Score: 3, Informative
  
  since the coke is only 300ish calories in the first place...
  
  For consumers, food calories are really kilo-calories. So in this case, you coke has 300,000 physic-style calories.
  
  If you look at a euopean food-labels, sometime you can seem them writen as kcal.
  
  --
  Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
4. Re:Big mac cluster.. by Graff · 2003-10-22 08:06 · Score: 4, Informative
  
  The original poster was wrong when he said:
  
  1 Cal (uppercase C) is the amount of heat required to raise the temperature of 1g of water 1 degree celsius
  
  A Calorie (the one used on food labels) is actually a kilocalorie. A Calorie is therefore 1000 calories. 1 calorie is basically the amount of heat needed to raise 1g of water 1 degree celsius. (A calorie is actually 1/100 of amount of heat needed to get 1 gram of water from 0 degrees C to 100 degrees C, but that works out almost the same.)
  
  This is explained a bit on this web page.
  
  So warming a 4 degrees C, 350mL Coke to 37 degrees C would take (37 - 4) * 350 = 11550 calories. This is 11.55 kilocalories or 11.55 Calories. The Coke has around 300 Calories in nutritive value therefore you would gain 300 - 11.55 = 288.45 Calories of energy from a 4 degrees C, 350mL can of Coke.
  
  --
  Sapere aude!
Instant Numbers... by Dracolytch · 2003-10-22 07:12 · Score: 3, Insightful

Not terribly surprising. Much like estimated death tolls for disasters, never believe the first set of benchmarks for a computer. Wait until thorough testing can be done before you start believing the numbers.

Y'all should know this by now. ;)
~D

--
This sig has been enciphered with a one-time pad. It could say almost anything.
Does anyone else have trouble reconciling... by ikewillis · 2003-10-22 07:14 · Score: 4, Funny

"best price performance" and "Apple" in their minds?
Catch Phrase by humpTdance · 2003-10-22 07:14 · Score: 4, Funny

Best Price/Performance ratio = promotional video with the phrase:
"Virginia Tech: Home of the Poor Man's Supercomputer and Michael Vick."
This is NOT all that surprising. by dbirchall · 2003-10-22 07:16 · Score: 5, Insightful

A single G5 FPU (each CPU has 2) can do 1 64-bit (double precision) FLOPs per cycle, or 2 if and only if those two are a MULTIPLY and an ADD.
Apparently there are a lot of cases where a MULTIPLY and an ADD do come together like that, but I'm not surprised if LINPACK doesn't consist entirely of those pairs. ;)
The 17.6 TFLOP theoretical peak assumed a perfect case consisting entirely of MULTIPLY-ADD pairs. In a case assuming no MULTIPLY-ADD pairs, the theoretical peak is 8.8 TFLOPs.
7.4 TFLOPs is only 42% of 17.6 TFLOPs, but it's 84% of 8.8 TFLOPs. I suspect the actual "efficiency" of the machine lies somewhere in the middle.
(As for me, I'm happy with just ONE dualie...)
1. Re:This is NOT all that surprising. by hackstraw · 2003-10-22 07:36 · Score: 4, Informative
  
  FWIW here are the efficiencies for the top 10 on www.top500.org:
  
  87.5 NEC Earth-Simulator
  67.8 Hewlett-Packard ASCI Q
  69.0 Linux Networx MCR Linux Cluster Xeon
  59.4 IBM ASCI White
  73.2 IBM SP Power3
  71.5 IBM xSeries Cluster
  45.1 Fujitsu PRIMEPOWER HPC2500
  79.2 Hewlett-Packard rx2600
  72.0 Hewlett-Packard AlphaServer SC
  77.7 Hewlett-Packard AlphaServer SC
The Mac cluster is still on top per CPU by BWJones · 2003-10-22 07:19 · Score: 4, Interesting

While these numbers will no doubt come as a disappointment for Mac zealots who wanted to blow away all the Intel machines, it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer.

It still bests all other Intel hardware with only the Alpha hardware on top. And given the CPU count, even the Alpha hardware does not match it. Look at the numbers.....The Linux based 2.4Ghz cluster has almost 200 more CPU's on board with a 217 Gflop/sec difference. The Alpha clusters are running anywhere from 1,984 to 6,048 more CPU's.

--
Visit Jonesblog and say hello.
Now at 8.2 Tflop as of today (Oct 22) by daveschroeder · 2003-10-22 07:19 · Score: 4, Informative

See http://www.netlib.org/benchmark/performance.pdf page 53.

Since yesterday's release at 7.41 Tflop, the G5 cluster has already increased almost a Tflop, and is now ahead of the current #3 MCR Linux cluster, and about 0.5 Tflop behind a new Itanium 2 cluster.
Big Mac? How does that compare with a WOPR? by Anonymous Coward · 2003-10-22 07:19 · Score: 5, Funny

/Watched WarGames too many times as a kid.
They didn't save the world AGAIN? by ianscot · 2003-10-22 07:20 · Score: 4, Insightful

Yet another Apple product that failed to save the world. Lately they do nothing but disappoint us. Boo.
First you have the iTunes store which doesn't do anything but give the average user basically anything he or she might have wanted to have in on online music store. Despite its being free, we're all cheesed off that it doesn't support OGG, or it's meant partly to push iPods (duh), or whatever.
Now this -- a supercomputer that has, to quote that again, the "best price/performance ratio ever achieved on a supercomputer." But dang it all, it doesn't completely blow away every established precedent -- it's just in the top five on the usual list of comparisons. One more crushing disappointment.
From Microsoft, we just want products that don't completely ream us. From Apple, we want the entire world to seem a little friendlier and cooler with every product release, every dot-incremenent OS update. They both disappoint us, but the expectations seem a little different...

--
"Fundamentalism" isn't about divine morality. It's about human authority.
Not really by daveschroeder · 2003-10-22 07:26 · Score: 4, Informative

The preliminary performance report at http://www.netlib.org/benchmark/performance.pdf contains the new entries for the upcoming list as well (see page 53).
Also Important? by ThosLives · 2003-10-22 07:34 · Score: 3, Informative

If you read the fine print, the Nmax for the G5 was 100,000 higher than for the Linux cluster. Now, that's kind of interesting, because the G5 cluster was then only slightly slower doing a much bigger (450,000 Nmax vs 350,000 Nmax on the Xeons) problem. I wonder why they don't somehow scale the FLOPs to reflect this fact.
Anyone know how much merit there is to using Nmax (or N1/2) to compare different systems?

--
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
Moore's Law applied by moof-hoof · 2003-10-22 07:34 · Score: 3, Interesting

...it should still be noted that this is the best price/performance ratio ever achieved on a supercomputer.
Yes, but doesn't Moore's Law and the commodification of computer hardware suggest that each new generation supercomputer will have the best price/performance ratio?
Thats nothing by madpierre · 2003-10-22 07:40 · Score: 4, Funny

I installed a button on the front of my cluster
to manually clock the CPU's.

So far i've managed ONE whole flop.

My record is for the slowest supercomputer
on the planet.

--
siggy played guitar
1. Re:Thats nothing by IM6100 · 2003-10-22 08:10 · Score: 3, Funny
  
  Build a computer that uses all CMOS static registers.
  
  Attach a hall-effect sensor to a hamster wheel to drive the clock.
  
  Go out and buy a hamster.
  
  --
  A Good Intro to NetBS
From the horse's mouth by Jungle+guy · 2003-10-22 07:56 · Score: 4, Interesting

Jack Dongarra says that a "supercomputer" is simply a computer that, for todays's standards, is REALLY fast. I saw one presentation from him, and he said he run the Linpack benchmark on his notebook (2.4 GHz Pentium 4) and it would get to the bottom of the Top500 list in 1992. So, this supercomputer definition is very fluid.
Re:And mac fans are complaining? by gerardrj · 2003-10-22 07:57 · Score: 4, Interesting

Because the Power4 is hotter and uses more current than the G5. To use 2200 Power 4 CPUs they would have to about triple the cooling capacity of the room. For all the heat and power, the Power4 lacks the AltiVec units that allow the G4/G5 to process vector operations so quickly.

The G5 is also significantly lower cost than the Power4

--
Article X: The powers not delegated... by the Constitution...are reserved...to the people
Scalability by jd · 2003-10-22 08:16 · Score: 4, Informative

First, scalability is highly non-linear. See Amdahl's Law. Thus, the loss of performance is nothing remarkable, in and of itself.

The degree of loss is interesting, and suggests that their algorithm for distributing work needs tightening up on the high-end. Nonetheless, none of these are bad figures. When this story first broke, you'll recall the quote from the top500 list maintainer who pointed out that very few machines had high performance ratings, when they got into the large numbers of nodes.

I'd say these are extremely credible results, well worth the project team congratulating themselves. If the team could open-source the distribution algorithms, it would be interesting to take a look. I'm sure plenty of Mosix and BProc fans would love to know how to ramp the scaling up.

(The problem of scaling is why jokes about making a Beowulf cluster of these would be just dumb. At the rate at which performance is lost, two Big Macs linked in a cluster would run slower than a single Big Mac. A large cluster would run slower than any of the nodes within it. Such is the Curse that Amdahl inflicted upon the superscaler world.)

The problem of producing superscalar architectures is non-trivial. It's also NP-complete, which means there isn't a single solution which will fit all situations, or even a way to trivially derive a solution for any given situation. You've got to make an educated guess, see what happens, and then make a better informed educated guess. Repeat until bored, funding is cut, the world ends, or you reach a result you like.

This is why it's so valuable to know how this team managed such a good performance in their first test. Knowing how to build high-performing clusters is extremely valuable. I think it not unreasonable to say that 99% of the money in supercomputing goes into researching how to squeeze a bit more speed out of reconfiguring. It's cheaper to do a bit of rewiring than to build a complete machine, so it's a lot more attractive.

On the flip-side, if superscaling ever becomes something mere mortals can actively make use of, understand, and refine, we can expect to see vastly superior - and cheaper - SMP technology, vastly more powerful PCs, and a continuation of the erosion of the differences between micros, minis, mainframes and supercomputers.

It will also make packing the car easier. (* This is actually a related NP-complete problem. If you can "solve" one, you can solve the other.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Price vs Preformance: Off an order of magnitude by G4from128k · 2003-10-22 08:23 · Score: 4, Interesting

I think that magazine article must be wrong. If 1100 Macs use as much power as 3000 homes, then each mac is using about 3 houses worth of power. That seems excessive unless the home is in a 3rd world country or those 9 fans are really really running full blast. More likely, each G5 (with networking and cooling equipment) uses a few hundred watts. Even at 500 W/Mac, 1100 Macs, $0.15/kWH, 24 Hr/day, 365 day/year the cluster costs about $722,700/year. More likely, each Mac probably only consumes an average of 300 W max and is not running full tilt 24x7, so the cost is maybe around $300-$400k/year.

But your point is a good one. I often wonder about the environmental economics of people running SETI, Folding@Home, etc. on older machines. Most of those older "spare" CPU-cycles are quite costly in terms of electricity relative to newer faster machines that do an order of magnitude more computing with the same amount of electricity.

--
Two wrongs don't make a right, but three lefts do.
AltiVec won't help here by Troy+Baer · 2003-10-22 08:40 · Score: 4, Informative

The Linpack benchmark, as compiled to the G5, is not utilizing the processor to its fullest. The school is still in the process of adding Altivec compiler optimizations, which should drastically improve the results.
The AltiVec instructions support only single precision (32-bit) floating point operations, and the core routine in the Parallel Linpack Benchmark is DGEMM() which is double precision (64-bit). The G5 already has two double precision FPUs, each of which can do a multiply/add op every clock cycle.
My feeling is that the ~40% efficiency seen on the larger scale run is an indication that either VA Tech spent very little time tuning the problem size or they didn't design their InfiniBand fabric to really handle 1100 nodes hammering away at Parallel Linpack. (Given that they've been extremely vague about how their IB network is structured, I fear it may be the latter.)
Right now, the processor is behaving essentially as a G4 with a bigger fan and more memory addresses. Rumor has it that tweeking the compiler to abuse the Altivec unit may push the system above the theoretical limit in some calculations.
I doubt that's true, especially if they're using the IBM PPC compilers. The G4 has both significantly less memory bandwidth and a single double-precision-capable FPU, whereas the G5 is basically a single-core Power4 with an AltiVec unit in place of some cache. IBM's compilers (despite being a little wonky as far as naming and argument syntax) generally produce pretty fast code.
--Troy

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
seti@home not listed by suitti · 2003-10-22 09:09 · Score: 4, Interesting

The 21st version of this list does not
show the SETI@Home project. The top entry
is NEC at 35 terraflops. Today's SETI@Home
average for the last 24 hours is 61 terraflops.
It may be a virtual supercomputer, but it
is producing real results.

--
-- Stephen.
It's a good price/performance, but not best. by tmattox · 2003-10-22 15:31 · Score: 3, Interesting

I guess the original submission didn't see the slashdot article from August 23 about our KASY0 supercomputer breaking the $100 per GFLOPS barrier.
KASY0 achieved 187.3 GFLOPS on the 64-bit floating point version of HPL, the same benchmark used on "Big Mac". While "Big Mac" is about 40 times faster on that benchmark, it is about 130 times the cost of KASY0 (~$40K vs ~$5200K). Considering the size difference, "Big Mac" is VERY impressive, but it can't claim to be the best price/performance supercomputer on the HPL benchmark.
Note: KASY0 gets 482.6 GFLOPS (0.48 TFLOPS) on a 32-bit precision version of Linpack, satisfying our under $100 per GFLOPS claim.
Regardless, Virginia Tech's "Big Mac" is a very impressive machine. My congratulations to them!

--
Tim Mattox