Supercomputer Built With 8 GPUs
FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."
Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
For information on their current HPC platform checkout http://www.nvidia.com/object/tesla_computing_solutions.html FWIW I don't think there would be that big of performance advantage of putting the GPUs on the motherboard, infact you'd probably actually get a performance decrease if you UMA'd the memory. With discrete boards each GPU has it's own framebuffer resulting in higher memory bandwidth.
I love this picture: http://fastra.ua.ac.be/en/images.html
Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
They are comparing their system against normal computers, I'd be interesting to see a benchmark against a vector computer, like, eg. NEC SX9
Too bad this isn't really news. I guess it is news if you consider that someone else has had their application accelerated by NVIDIA GPUs. I guess the only other reason that this could be news is by virtue of having 8 GPU cores.
Unfortunately, this setup won't work ideally for a lot of other CUDA based applications. For the past 6 months, I had a system with 6 GPUs (actual physical GPUs). This is the system that I showed at CES. We are easily able to do 8 physical GPUs, and now I've been solely focused on utilizing Tesla.
Given that NVIDIA released the GX2 series, I was not surprised that someone would announce an 8GPU system. I'm surprised it took this long for someone to do it, and almost equally surprised that slashdot took this long to publish any news that is decent in the realm of GPU super computing. I've been cranking out close to 228 billion atom evals. per second in VMD for months now, versus about 4 billion on dual quad core 3.0GHz Xeons.
aren't most of the supercomputers designed to perform some very specific tasks ? You don't buy a supercomputer to run the Super edition of Excel.
Just to expand on this stuff: Different tools are (obviously) designed for different workloads. I have a project I was contemplating porting to the Cell. Unfortunately only 40% of my performance bottleneck could take advantage of SIMD, but that 40% could have taken advantage of an enormous number of SIMD instructions just like the workload from TFA.
The other critical 40% of my project would have gained absolutely nothing from SIMD and on the Cell would have lost time due to branches. In this case 300 c2d's would far exceed the throughput of 8 GPU's.
I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell are no match for either processor, first they both are only fast in single precission mode and much slower when they have to do double precission (the second generation of Cell is better at double precission) and they both have a weak memory subsystem when compared to a true VPU. It is slow and they can only use small memories. As far as I know the Cell can't even chain it's VPUs, something which was standard since the Cray-2 on VPUs.
You can't use the currency exchange rate to get the US price for this system as many companies don't use this to set their prices. I did a quick check and the hardware used in this "supercomputer" would cost you a bit less than $4000 at Newegg.
The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.
No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a way to design a universal multicore processor architecture that is at home in all possible parallel environments, not just vector parallel systems. IOW, we need a parallel processor that can handle anything we can throw at it with equal ease. Unfortunately, both the industry and academia are pushing the field toward so-called heterogeneous processors, hideous monsters that will be a nightmare to write code for. Check out Nightmare on Core Street for good explanation of the multicore crisis and how it can be solved.
The guys in Antwerp have probably got themselves the greater number crunching power, but reconstruction of tomographic images has been done using similar multi-core hardware. See the following (pdf alert) from the University of Erlangen, which uses a cluster of PS3s for a great use of commodity consumer hardware http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.imp.uni-erlangen.de%2FIEEE%2520MIC2007%2FKnaup_Poster_M19-291.pdf&ei=t_FBSKnZKoie1gbh2Y23Bg&usg=AFQjCNG7vNGmMM2hBrYdVKbwZAJZL0oS3Q&sig2=sEdlnPROC77CZ_KJ5OOgrg .
It's not really that surprising. CPUs are supposed to control the computer's resources and keep some kind of sanity and synchronization.. do one thing at a time and they do it fast. Multiple cores are nice but they just let you do 2 things at once. Yeah they're fancy and pipelined and there's all sort of asynchronous optimization with arithmetic and logic, but as a whole it's executing one instruction after another.
GPUs on the other hand are far more parallel. The thousands of individual subprocessors can be independently controlled in software and given different tasks.
You're being overly simplistic.
In order to utilize this "super computer", your problem has to be refactored in such a way that it can utilize the hardware efficiently. This can be either be fairly easy or incredibly difficult depending on the problem, tool-set available, etc. .
Their benchmark is good for them, but it is most likely meaningless to the general super-computing community. Porting something like LINPACK over and running that as a benchmark however would give a whole lot more insight into what kind of performance boost a typical scientific app might gain from said hardware.
Nice to see someone utilizing this functionality though.
~X~
~X~