10-TFlop Computer Built from Standard PC Parts
OrangeTide writes "Using PCI host adapters and Xeon processors, engineers at Lawrence Livermore National Labs have achieved 10-TFlops relatively cheaply. More information can be obtained from this article at EETimes." Lately, Linux seems to be the operating system of choice for new supercomputers, and this one's no different. It's cool to see big iron made cheaply.
can be found here.
Until Apple submits SPECCPU benchmark results, it is hard to escape the conclusion that they are not cost effective machines for building scientific computing clusters.
Of course the benchmarks might make that conclusion inescapable.
Mac fans are welcome to do the benchmarking to prove my suspicions incorrect. Or you could translate this page from Japanese. It seems to say that a G4 at 1GHz is about 1/6 the speed of a 2.8GHz P4 on the floating point benchmark.
Yes, they would be rockin fast if they used IBM Power4s. But they don't.
in case you were interested...
GROMACS is the main simulation program we use. Its very well programmed, optimized, and GPL to boot. I hope that the software I write will have this sort of functionality and optimization.
You'd think the EETimes would catch something like this:
nearly the same performance as the ASCII White system
No, it's ASCI White. Accelerated Strategic Computing Initiative, not the text format.
Actually there is... or was... I don't know. It's called Alphaworld. It's a huge multiuser VRML based world where anyone could claim some space and build something. Don't know if it exists anymore, used it years ago on my 486 with a 14.4k modem :) Google for it if you are intrested...
- Depends on the Xeons they are using. The 'old' Xeons are around the same cost as their AMD counterparts. The 'new' Xeons have large L3 caches (1M and 2M).
- The AMD SMP chipset is slow (memory bandwidth) compared to the newer Intel chipsets.
- IIRC, the P4s use less power than the Athlons, probably this is not as important but it is there.
I'd like to see a comparison of a newer dual Xeon machine vs. a good dual AMD to see the performance difference. I would suspect that the dual Xeon machine would be a bit faster.
More details -- probably more than you want :) -- here:
http://www.llnl.gov/linux/mcr/.
The interesting thing about this setup is that it doesn't work like the traditional supercomputer. It's more like a community of totally independant computers all willing to work on the same problem.
The system employs a whole lotta control nodes that spend their whole time trying to assign work out to the worker nodes. The problem then becomes not just parallelizing the work but coordinating the workers. Apparently with this cluster design, it's not all as cut-and-dried as with a "real" supercomputer. They have been able to do some really cool stuff, though. Like, for example, any computer in the cluster can address the memory on any other computer.
The admins I talked to said they weren't really sure just how fast the system could go, because they could never get it to operate at full capacity. They said the fastest they'd gotten it to go was 4T-Flops, but they figured they were only at %40 theoretical capacity.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925