Supercomputer Built With 8 GPUs

← Back to Stories (view on slashdot.org)

Supercomputer Built With 8 GPUs

Posted by kdawson on Saturday May 31, 2008 @05:22AM from the let-the-games-begin dept.

FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."

17 of 232 comments (clear)

Min score:

Reason:

Sort:

Re-birth of Amiga? by Yvan256 · 2008-05-31 05:30 · Score: 4, Interesting

Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?
1. Re:Re-birth of Amiga? by porpnorber · 2008-05-31 09:28 · Score: 3, Interesting
  
  I think the parent was seeing the same situation a little differently. You ever code up Conway's Life for the blitter? Whoosh! Now CUDA does floating point where the Amiga could only do binary operations, and the GPU has a lot more control onboard, but the analogy is not unsound. After all, CPUs themselves didn't even do floating point in the old days (though of course they did do narrow integer arithmetic).
2. Re:Re-birth of Amiga? by Anonymous Coward · 2008-05-31 09:57 · Score: 2, Interesting
  
  Modern' PCs have had the "Amiga design" since about the time the AGP bus became prevalent.
  Not really. The Amiga also had perfect synchronization between the different components. When you configured soundchip and graphics chip for a particular sample rate and screen resolution, you would know exactly how many samples would be played for the duration of one frame. And you had synchronization to the point where you could know which of the samples were played while a particular line was being sent through the D/A converter.
  
  Considering how often audio and video gets out of sync on a PC, I will not say they have caught up with the Amiga design.
  
  Another thing to remember is how much Amigas were being used in television production in the past. With an Amiga it was actually possible to sync the entire machine to an external clock source such that the video output could be mixed with another video source.
  
  Syncing the CPU to an external source these days may not be a good idea these days. But syncing audio and video should be a no brainer, it just happens to be tricky to achieve in a modular design.
  
  Another thing is the low latency the Amiga could achieve from input to output. If you moved the mouse while the last few lines of one picture was being sent to the monitor, the position would actually still be updated on the next frame.
Why haven't they started releasing GPU CPUs yet? by arrenlex · 2008-05-31 05:34 · Score: 3, Interesting

This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
Re:Why haven't they started releasing GPU CPUs yet by Anonymous Coward · 2008-05-31 05:39 · Score: 1, Interesting

For information on their current HPC platform checkout http://www.nvidia.com/object/tesla_computing_solutions.html FWIW I don't think there would be that big of performance advantage of putting the GPUs on the motherboard, infact you'd probably actually get a performance decrease if you UMA'd the memory. With discrete boards each GPU has it's own framebuffer resulting in higher memory bandwidth.
Brick of GPUs by Rufus211 · 2008-05-31 05:48 · Score: 4, Interesting

I love this picture: http://fastra.ua.ac.be/en/images.html

Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
Vector Computing by alewar · 2008-05-31 06:14 · Score: 2, Interesting

They are comparing their system against normal computers, I'd be interesting to see a benchmark against a vector computer, like, eg. NEC SX9
Re:By what benchmark? by cheier · 2008-05-31 07:04 · Score: 5, Interesting

Too bad this isn't really news. I guess it is news if you consider that someone else has had their application accelerated by NVIDIA GPUs. I guess the only other reason that this could be news is by virtue of having 8 GPU cores.

Unfortunately, this setup won't work ideally for a lot of other CUDA based applications. For the past 6 months, I had a system with 6 GPUs (actual physical GPUs). This is the system that I showed at CES. We are easily able to do 8 physical GPUs, and now I've been solely focused on utilizing Tesla.

Given that NVIDIA released the GX2 series, I was not surprised that someone would announce an 8GPU system. I'm surprised it took this long for someone to do it, and almost equally surprised that slashdot took this long to publish any news that is decent in the realm of GPU super computing. I've been cranking out close to 228 billion atom evals. per second in VMD for months now, versus about 4 billion on dual quad core 3.0GHz Xeons.
Re:Not a Supercomputer -- Special purpose hardware by emilper · 2008-05-31 07:17 · Score: 2, Interesting

aren't most of the supercomputers designed to perform some very specific tasks ? You don't buy a supercomputer to run the Super edition of Excel.
Re:By what benchmark? by raftpeople · 2008-05-31 07:27 · Score: 2, Interesting

Just to expand on this stuff: Different tools are (obviously) designed for different workloads. I have a project I was contemplating porting to the Cell. Unfortunately only 40% of my performance bottleneck could take advantage of SIMD, but that 40% could have taken advantage of an enormous number of SIMD instructions just like the workload from TFA.

The other critical 40% of my project would have gained absolutely nothing from SIMD and on the Cell would have lost time due to branches. In this case 300 c2d's would far exceed the throughput of 8 GPU's.
Have they profiled it? by Chris+Snook · 2008-05-31 07:56 · Score: 3, Interesting

I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Re:Wave of the Future? Yes by Josef+Meixner · 2008-05-31 08:12 · Score: 2, Interesting

The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.

Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell are no match for either processor, first they both are only fast in single precission mode and much slower when they have to do double precission (the second generation of Cell is better at double precission) and they both have a weak memory subsystem when compared to a true VPU. It is slow and they can only use small memories. As far as I know the Cell can't even chain it's VPUs, something which was standard since the Cray-2 on VPUs.
Re:This is awesome! by Anonymous Coward · 2008-05-31 09:05 · Score: 1, Interesting

You can't use the currency exchange rate to get the US price for this system as many companies don't use this to set their prices. I did a quick check and the hardware used in this "supercomputer" would cost you a bit less than $4000 at Newegg.
We Need a Universal Multicore Processor by MOBE2001 · 2008-05-31 10:55 · Score: 3, Interesting

The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.

No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a way to design a universal multicore processor architecture that is at home in all possible parallel environments, not just vector parallel systems. IOW, we need a parallel processor that can handle anything we can throw at it with equal ease. Unfortunately, both the industry and academia are pushing the field toward so-called heterogeneous processors, hideous monsters that will be a nightmare to write code for. Check out Nightmare on Core Street for good explanation of the multicore crisis and how it can be solved.
Has been done before with PS3s by nkeat · 2008-05-31 13:02 · Score: 2, Interesting

The guys in Antwerp have probably got themselves the greater number crunching power, but reconstruction of tomographic images has been done using similar multi-core hardware. See the following (pdf alert) from the University of Erlangen, which uses a cluster of PS3s for a great use of commodity consumer hardware http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.imp.uni-erlangen.de%2FIEEE%2520MIC2007%2FKnaup_Poster_M19-291.pdf&ei=t_FBSKnZKoie1gbh2Y23Bg&usg=AFQjCNG7vNGmMM2hBrYdVKbwZAJZL0oS3Q&sig2=sEdlnPROC77CZ_KJ5OOgrg .
Re:By what benchmark? by Brian+Gordon · 2008-05-31 14:16 · Score: 2, Interesting

It's not really that surprising. CPUs are supposed to control the computer's resources and keep some kind of sanity and synchronization.. do one thing at a time and they do it fast. Multiple cores are nice but they just let you do 2 things at once. Yeah they're fancy and pipelined and there's all sort of asynchronous optimization with arithmetic and logic, but as a whole it's executing one instruction after another.

GPUs on the other hand are far more parallel. The thousands of individual subprocessors can be independently controlled in software and given different tasks.
Re:By what benchmark? by Xyrus · 2008-05-31 16:25 · Score: 3, Interesting

You're being overly simplistic.

In order to utilize this "super computer", your problem has to be refactored in such a way that it can utilize the hardware efficiently. This can be either be fairly easy or incredibly difficult depending on the problem, tool-set available, etc. .

Their benchmark is good for them, but it is most likely meaningless to the general super-computing community. Porting something like LINPACK over and running that as a benchmark however would give a whole lot more insight into what kind of performance boost a typical scientific app might gain from said hardware.

Nice to see someone utilizing this functionality though.

~X~

--
~X~