Virginia Tech to Build Top 5 Supercomputer?
hype7 writes "ThinkSecret is running a story which might explain exactly why the Dual 2GHz G5 machines have been delayed to the customers that ordered them minutes after the keynote was delivered. Apparently, Virginia Tech has plans to build a G5 cluster of 1100 units. If it manages to complete the cluster before the cut-off date, it will score a Top 5 rank in the Linpack Top 500 Supercomputer List. Both Apple and the University are playing mum on the issue, but there's talk of it all over the campus."
I don't want to start a holy war here, but what is it with you with you G5 zealots? Ive been sitting at my 1100 CPU G5 supercomputer for 20 minutes as it computers a fast fourier transform of an 8Ghz guassian. 20 minutes! At home, on my 60 cpu linux beowulf cluster, the same operation would take 2 minutes if that. Also, while this operation is takiing place, Doom III won't start, and everything else grinds to a halt, even my multithreaded emacs is struggling to keep up as i type this.
My Sun Enterprise 5000 is faster than this machine at times. Super computer addicts, flame me if you want, but I'd rahter hear some inteligent reasons why I should use the G5 supercomputer over cheaper, faster clusters.
I got the following email the other day:
Virginia Tech is in the process of building a Terascale Computing Cluster which will be housed in the Andrews Information Systems Building (AISB). For those who are interested in learning more about this project, we will host an information session on Thursday, September 4th from 11 a.m. to noon in the Donaldson Brown Hotel and Conference Center auditorium.
We look forward to seeing you there
Terry Herdman Director of Research Computing.
I'll try to remember to take notes on this and let you all know if there's anything interesting...
Friends help you move. Real friends help you move bodies.
Altivec. Certain types of vector code when compiled to only run on a G4 outperform a pentium even at 3+x the ghz range (i.e. a 800 mhz G4 beating a 3ghz PIV). Assuming similar numbers for the G5 and the increase across the board on all the non vector operations + the fact that the 970 work together so much better....
I can see it making a lot of sense. NASA and lots of bio companies use the G4s this way.
Yeah, chicks dig massive...computers.
No wait, no they don't!
why PowerMacs?
A couple of things make them suitable for clustering:
* There's heaps of processor-processor bandwidth and memory bandwidth.
* On board gigabit ethernet.
* Monster fast execution of properly written vector code.
* Well designed cooling.
Of course, the bang/buck ratio could be an issue for some debate but there's little doubt that in comparison to other commercial unices it's an absolute bargain.
Dave
I write a blog now, you should be afraid.
For certain types of processing (rc5 cracking is one example), Macs completely smoke PCs. For example, distributed.net stats show that a 667Mhz G4 can process more keys/second than a 2.8Ghz P4. Considering how much faster a 2Ghz G5 would be, a 1100-node cluster would be damn powerful if you were doing work that mapped well onto Altivec.
It's hard to be religious when certain people are never incinerated by bolts of lightning.
The grant money that flows into a public research and occasionally teaching institution can be stagering, and absolutely dwarf the money students pay in tuition (sometimes by a factor of 10!). A better question might be, why don't the gradstudents donating their labor, possibly to patents that will be controlled by the university, recieve more consideration, and fair labor law protections.
But I would bet this will be not too dissimilar in use from the HP Itanium2 referenced earlier on slashdot. I would bet one of the paramount concerns this cluster would look at is the effect of farm runoff, and probably climatology too among other things.
--Jimmy has fancy plans; and pants to match.
>> A box designed to be separate just will not have the latency advantage of a supercomputer designed from the ground up.
I suggest you look at the list of the top supercomputers in the world. Most are clusters, ie. separate, distinct machines (just a quick glance shows the top 25 all are). It's just too darn hard to make a shared memory computer with 1000's of processors. So the common architecture is to make a cluster of smaller shared memory machines.
Besides, most clusters built utilize special interconnects like Myrinet that offer low latency connections. They're more expensive than ethernet, but it's a supercomputer so you spend it.
>> All this "the internet is one giant distributed computer" doesn't acknowledge this.
On the contrary... people know this very well. That's why we see rendering and SETI processing as distributed. They don't really need to communicate with others often.
There are tradeoffs actually. This isn't like distributed.net or seti@home, this is a controlled network. They have complete control over the network switches, technology, and topology used and can design the network to accomodate tho problems the cluster will be designed to solve.
For example, you could use Myrinet to get 2 Gigabit, super low latency connectivity, or Quadrix, or Infiniband, or just a well laid out Gigabit Ethernet with high end switches.
In multiple processors in a box, the processors have to fight for the resources that box has to offer. NUMA alleviates demand on the memory, but IO operations (when writing to disk or to network) in a multiprocessor box block a good deal as the processor count in a node rises.
The idea with clusters is that inter-node communication in most cases can be kept low. Each system can work on a HUGE chunk of a problem on its own, with its own dedicated hard drive, memory subsystem, and without having too much competition for the network card. A lot of problems are really hard to solve computation wise, but are *very* well suited to distributed computing. A prime example of this is rendering 3D movies. Perhaps oversimplifying things, but for the most part, a central node divides up discrete parts (a segment of video), and each node works without talking to others until done, so the negative impact is minimal. Certain problems (i.e. nuclear explosion simulations where time and spacial chunks interact more with one another) are much more sensitive to latency/throughut. Seti@Home and distributed.net are *extremely* apathetic to throughput/latency issues (not much traffic and very infrequent communication).
XML is like violence. If it doesn't solve the problem, use more.
With 1100 machines in the cluster, there must be _at least_ 2200 DIMMs. Since these must be 400MHz (PC3200) DDR, they can't be on a large 0.15 micron DRAM process, but most likely between 0.11 and 0.13u.
d /d imm_results.htm
Who cares?
APPLE G5'S DO NOT SUPPORT ECC.
The random bit error rate for 2200 DIMMs with 0.13u cells is roughly one '1' bit dropped to '0' every 9 hours. In other words: good luck getting any reliable, large-scale computation done with this cluster. (And I do mean "good luck" - they might get a run of two or three days without any problems once in a while.)
Now if only Apple would support PC3200 ECC DIMMS, which certainly do exist:
http://www.intel.com/technology/memory/ddr/vali
this cluster might be a bit more useful for real work.
I wonder if any universities have tried to write a distributed computing app along the lines of seti. Require it to connect to the university network, it grabs itself maybe 50 megs of hd space, and a fraction of all the new computers people bring to campus, in addition to all the computer lab gear belong to their massive number crunching problems. Make another version available to alumni, or even institutions as some form of corporate sponsorship.
Then if it got popular, and they were really clever, they could sell off a part of that computational power they amassed to solve other peoples problems providing for funding for new versions and new supercomputing clusters.
--Jimmy has fancy plans; and pants to match.
The SSE2 unit on the P4 or the Opteron would have nearly the same performance and cost a whole heck of a lot less.
Uh, no. 2 years ago, my roommate and I were both running the distributed.net client. I have a 500 Mhz Powerbook G4 (100Mhz bus). He had a 1.4GHz P4 with rambus RAM. I got 4Million keys/sec. He got 2MKeys/sec.
So clock for clock, my machine was nearly 4 times faster.
I have a shitty sig!
i would take this story to imply that a G5 powered Xserve is not going to be shipping anytime soon..... the Xserve is made to cluster and run in situations like this. i guess the rumor sites can speculate if it's G5 parts available or some other holdup on a G5 Xserve.
/. a year or so ago about a group that went from building a rack and unboxing their G4s to a running cluster in part of a day. i really don't remember the specifics but i think it was something like 30 G4s? i would guess the G5 is not that much harder... and they seem to have Apple helping. maybe they hooked up the optical cards from the Xserve...... we'll see i guess.
unless there is some reason the desktops are better for this project that i did not pick up on?
as for the above question about Macs.... depending on what they want to really do with this, Altivec is really efficient for some computations. all flame wars aside there have always been people clustering Macs for certain uses. i do not know how much of it was user preference or the software they wanted to run or the simplicity of getting the cluster running.
it is supposedly VERY simple to cluster Macs. there was a story on
As Zack pointed out, iWalk was not a Think Secret report; in fact, we debunked it. For WWDC, we reported that Apple would announce 64-bit Power Macs as well as a videoconferencing camera that we said would be called "iSight," -- I think we're in the clear there. iWorks? I maintain that it is still a future Apple release. As for 12-inch and 17-inch PowerBooks, while we raised the possibility of a release that week, we specifically said we couldn't confirm the delivery date: "It's unclear when Apple plans to announce the upgrades..."
Bottom line? Like any other news organization, Think Secret has occasional misses. But those misses don't appear to include any of the items mentioned here. I think our record speaks for itself.
Nick dePlume
Publisher and Editor in Chief
Think Secret
1. The PPC970 draws from the Power4 lineage, which I have used for a long time. The PPC970 has 2 double precision FPUs, each capable of fused multiply add instructions leading to 4 flops/cycle/processor (2 units*2flops/cycle). This is identical to the Itanium2 FPU microarchitecture. The Opteron on the other hand can only do 2 double precision flops/cycle, which makes it only half as powerful on matrix heavy scientific computations, when compared to the PPC970 or the Itanium 2. The PPC970 should really be compared in FP terms to the Itanium2 at 1/10th of the cost, and at 2GHz it is clocked higher than the top-end 1.5GHz Itanium2 Madison. Moral of the story, read thy arstechnica. 2. The standard benchmarking process (LINPACK) only uses double precision FP. If this rumor is true, then this machine is capable of an Rpeak (LINPACK) of 17.6 Teraflop, which those of you who follow top500 will realize is quite substantial. 3. If they are really using Infiniband, this should be a nice machine. Infiniband provides 10 Gbps (20 Gbps full duplex) of bandwidth, which is much faster than either Myrinet or Quadrics. Also Infiniband latency is 10us and the benchmarking process is bandwidth not latency sensitive. On the other, this stuff is really expensive. If all of this is true, this would be a major engineering endeavor. Also, it is probably cheap. However, all in all, this could well just be a rumor (come on it is thinksecret - remember iWorks). If not, this should be a fairly substantial machine.