Supercomputer Built With 8 GPUs
FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."
By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?
My blog
They didn't have enough dough for 9.
Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
Ok, probably a paid NVIDIA ad placement, but check TFA anyway (and even if you don't read, you gotta love the case). It looks like heat generation is one of the biggest problems--sweet.
...and at 4000EUR, that comes to what (rolls dice, consults sundial) about $20000 American?
I like this too:
The medical researchers ran some benchmarks and found that in some cases their 4000EUR desktop superPC outperforms CalcUA, a 256-node supercomputer with dual AMD Opteron 250 2.4GHz chips that cost the University of Antwerp 3.5 million euro in March 2005...
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
In other news Graphics cards are good at . . . graphics.
I can't imagine that it is a coincidence that this comes along just as Nvidia are crowing about CUDA, or that the resulting machine looks like a gamer's dream rig.
While there is ample crossover between hardware enthusiasts and academia, anyone soley with the computation interest in mind probabyl wouldn't be selecting neon fans, aftermarket coolers or spend that much time on presentable wiring.
They are useful for applications that can be massively parallelized. Your average program can't break off into 128 threads, that takes a little bit of extra skill on the coder's part. If, for example, someone could port gcc to run on the GPU, think of how happy those Gentoo folks would be :) (make -j128)!
Obligatory blog plug: http://www.caseybanner.ca/
... 3D Realms announced this as the minimum platform requirements to run Duke Nuke'em Forever.
__ Someday, but not this morning, I'll finally learn to use the preview button.
It does not run Linux.
Can it? Anybody?
Help stamp out iliturcy.
No; if you read all the way to the end, you can see where they discuss the limited specific "general" programs that currently support this kind of thing. Namely, folding@home (on ATI cards) and maybe Photoshop in the future. The tomography software they use is likely their own code, is graphics-heavy, and is tailored for this set-up.
For information on their current HPC platform checkout http://www.nvidia.com/object/tesla_computing_solutions.html FWIW I don't think there would be that big of performance advantage of putting the GPUs on the motherboard, infact you'd probably actually get a performance decrease if you UMA'd the memory. With discrete boards each GPU has it's own framebuffer resulting in higher memory bandwidth.
Something that can play Crysis!
Why not just buy a premade Tesla system from nVidia and avoid the heating problems?
this is an example of acceleration architecture. Anyone who have used FPGAs knows that. Ofcourse, making sensational news is a too common thing on /.
They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
It is possible to solve non-graphics problems on graphics cards nowadays, but the hardware is still very specialized. You don't want the GPU to run your OS or your web browser or any of that; when a SIMD (single instruction, multiple data) problem arises, a decent computer scientist should recognize it and use the tools he has available.
Also, this stuff isn't as mature as normal C programming, so issues that don't always exist in software that's distributed to the general public will crop up because not everyone's video card will support everything that's going on in the program.
Pardon the italics, but I was impacted by the killer slant of this posting.
For specific kinds of calculations, sure, GPGPU supercomputing is superior. I would question what software optimization they had applied to the 300 CPU system. Apparently, none. Let's not sensationalize quite so much, shall we?Invenio via vel creo
It is also not difficuult to find other tasks where, e.g., FPGAs peform vastly better than general-purpose CPUs. That does not make an FPGA a "Supercomputer". Stop the BS, please.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Check out the GPGPU (General Purpose GPU) project:
http://www.gpgpu.org/
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
Simple: This is not a supercomputer at all, just special-purpose hardware running a very special problem. For general computations, GPUs are pretty inferiour.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I love this picture: http://fastra.ua.ac.be/en/images.html
Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
I don't understand completely what you mean, but:
* Only some people need the extra speed.
* The tasks that can be accelerated this much belong to a specific domain of computation.
* NVIDIA're just starting.
Jag pratar lite svenska.
It is obvious that, if a computer is using GPUs exclusively, it is limited to vector or data parallel processing. And it is no surprise that it is being used by an outfit that specializes in visual processing, which is ideally suited to data parallel processing. Change the benchmark program to code that has a lot of data dependencies and this "supercomputer" will choke to a crawl.
Wave of the Future? Yes*. Revolution in computing? Not quite.
The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.
The researchers took a specialized piece of hardware, rewrote their code for it, and found it was faster than their original code on generic hardware. The problems here are that you have to rewrite your code (High Energy Physics codebases are about a GB, compiled... other sciences are similar) and you have to have a problem which will run well on this scheme. Have a discrete problem? Too bad. Have a gigantic, tightly coupled problem which requires lots of inter-GPU communication? Too bad.
Have a tomography problem which requires only 1GB of RAM? Here you go...
The standard supercomputer isn't going away for a long, long time. Now, as before, a one-size-fits-all approach is silly. You'll start to see sites complement their clusters and large-SMP machines with GPU power as scientists start to understand and take advantage of them. Just remember, there are 10-20 years of legacy code which will need to be ported... it's going to be a slow process.
In all likelihood, if they tried too hard to advertise their speed advantage, they would be called dishonest because most programs would not exhibit a speedup.
Contribute to civilization: ari.aynrand.org/donate
Fortunately, Nvidia provides a CUDA version of the basic linear algebra subprograms, so even if your software is hard to port, you can speed it up considerably if it does some big matrix operations, which can easily take a long time on a CPU.
They are comparing their system against normal computers, I'd be interesting to see a benchmark against a vector computer, like, eg. NEC SX9
Because for 95%+ of the problems a general purpose computer tackles GPU's would suck. It's only in very special cases that GPU's outperform CPU's. Thus, your idea is a poor one.
Precisely. But that happens to be one of the areas where more performance is still needed.
You don't need a super-duper CPU for text editing, that's for sure. For most of the tasks people do on computers, we have had CPU enough for the last 15 years or more. But where we still need more CPU happens to be mostly in tasks that ARE massively parallel, for instance, physics simulations, of which you will find several examples in the nVidia site.
I'm following this technology with much interest, and I think I will have a major upgrade in my home computer soon. My old FX-5200 card has been more than enough for my gaming needs, but now I have a new reason for upgrading.
The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than 4,000 EUR to build
What's more crazy: calling something this inexpensive a supercomputer, or 4 video cards costing a freaking 4,000 EUR.
Nvidia offers an external GPU solution specifically for "deskside supercomputing", the Tesla D870. It has only 2 cores with 1.35GHz each, apart from it being a bit more expensive, I wonder how it compares (you can connect several to a PC).
"I love my job, but I hate talking to people like you" (Freddie Mercury)
AFAIK, in many (all?) ordinary consumer graphics cards, minor mistakes by the GPU are tolerated because they'll typically result in (at worst) minor or unnoticable glitches in the display. I assume that this is because, to get the best performance, designers push the hardware beyond levels that would be acceptable otherwise.
Clearly if you're using them for other mathematical operations, or to partly replace a standard CPU, such mistakes might *not* be acceptable.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
What a pity that they are running Windows XP. This computer could be the first to see running Vista at an acceptable speed.
On par with anyone else's system?
They are using graphical processors to process graphics. Truly revolutionary. Who woulda thunkit?
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Ok, so if GPUs are far more powerful than CPUs why doesn't nVidia just create a CPU based on their GPU? AMD & Intel would be in serious troubles then. And if 300 CPUs can be overtaken by just 8 GPUs, why the heck people haven't been doing it before? No one got such idea? Yeah, right... Something stinks here...
My Windows is NOT slow, it's special!
*Insert Beowulf cluster meme*
In Soviet Russia, everything runs linux.
I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell are no match for either processor, first they both are only fast in single precission mode and much slower when they have to do double precission (the second generation of Cell is better at double precission) and they both have a weak memory subsystem when compared to a true VPU. It is slow and they can only use small memories. As far as I know the Cell can't even chain it's VPUs, something which was standard since the Cray-2 on VPUs.
I have the impression that the graphics cards only do single precision floating point math, just like the CELL. Makes sense since both were developed for gaming, and single precision is sufficient for graphics. .
For many scientific problems however, double precision is a must have. I guess that's one reason why GPU based systems aren't in the
I wish people were more explicit about what kind of FLOP they are talking about when claiming TeraFLOPs performance.
Sure, this machine is for floating point calculations. But for general-purpose computing, I think one of these setups would be more efficient: "24-core Linux cluster in IKEA cabinet" http://helmer.sfe.se/
The main post does not contain a direct link to the project webpage: http://fastra.ua.ac.be/, which contains a lot more info and explanations.
The graphs are time taken in seconds to perform a task - so shorter graph == better.
--- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
Mind the frickin' laser...
Wouldn't be the first time a supercomputer was developed with a specific task in mind.
---- Booth was a patriot ----
The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.
No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a way to design a universal multicore processor architecture that is at home in all possible parallel environments, not just vector parallel systems. IOW, we need a parallel processor that can handle anything we can throw at it with equal ease. Unfortunately, both the industry and academia are pushing the field toward so-called heterogeneous processors, hideous monsters that will be a nightmare to write code for. Check out Nightmare on Core Street for good explanation of the multicore crisis and how it can be solved.
The key to why they were able to do this is the CUDA C programming environment:
http://www.nvidia.com/cuda
Which is faster? A Lamborghini or a 5-ton flatbed truck?
Depends on what you're after! If you are trying to get yourself from point A to point B, the Lamborghini is the obvious choice. But if you need to move 4.5 tons of stuff from point A to point B, the Lamborghini would suck ass when compared to the flatbed truck.
It's just a question of what you are trying to accomplish. There is no absolute framework for "power" to solve problems, even if you define it fairly narrowly. For example, let's talk about 'pattern matching': A free database (like PostgreSQL) on cheap hardware can search through millions of records to deliver a query result in a tenth of a second. In that respect, Postgres is WAY faster than, say, the human brain. But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.
Use the right tool for the job, and you'll be amazed at the results. That 8 GPUs handily outperform 512 CPU cores at a specific task is not surprising - the GPUs are designed from the beginning to solve the kind of problem that's needed!
Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
The guys in Antwerp have probably got themselves the greater number crunching power, but reconstruction of tomographic images has been done using similar multi-core hardware. See the following (pdf alert) from the University of Erlangen, which uses a cluster of PS3s for a great use of commodity consumer hardware http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.imp.uni-erlangen.de%2FIEEE%2520MIC2007%2FKnaup_Poster_M19-291.pdf&ei=t_FBSKnZKoie1gbh2Y23Bg&usg=AFQjCNG7vNGmMM2hBrYdVKbwZAJZL0oS3Q&sig2=sEdlnPROC77CZ_KJ5OOgrg .
What this makes me thinking about it the marketing campaign that 3DFx used back when they launched their Voodoo 5 series ("So powerful it's kind of ridiculous").
Most of the TV spot started explaing how scientist could save humanity with GFLOPS-grade chips. But then humorously, the TV sport announces that they decided to play game (often with hilarious effect on the various "dreams of a better humanity" that the first half of the spot showed).
In a funny twist of things, it's the exact opposite that happened actually : nVidia and ATI were aiming for the highest possible frame rate in games, but then they decided to do science and did CUDA & Brook.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
An ATI or nVidia GPU as main processor is completely out of question. These things aren't even able to *call* a function (everything is inlined at compile time), not to mention that everything is ran by a SIMD engine behind the scene which mean that all the processing unit must all actually run the same code at the same time.
Intel on the other hand are touting their future Larrabee as being completely compatible with the x86 instruction set. The whole thing, according to them, should behave like a big many-core in-order CPU - think of Tilera's Tile64, but with overclocked Pentium1 + SSE (hum... probably the Larrabe is going to be a many core silverthorne ?)
They have mentioned the possibility of producing Larrabee that use quickpath to communicate with the outside and fit into a server socket. They could even be it used alone and boot the OS by themselves, thank to the compatibility with the original x86 instruction set.
Of course, that's only marketing, there are no Larrabees out yet. We shall see...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Impressive performance yes... but a supercomputer? Hardly!
Next article will be "hey, I overclocked my Q6600 and now it's a supercomputer."
Realistically this is a rather normal computer with a rather fast coprocessor.
The major different between CPU's and GPU's is that CPU's have to handle the effects of different branch conditions. One of the optimizations that CPU's have been designed for, is the combination of pipelining and dual path evaluation. Because the CPU is pipelining instructions, it has to evaluate both outcomes of each conditional instruction in parallel with the actual condition, and then select the actual outcome once it is actually known. Calculating the condition first followed by the outcome, would reduce performance by a half.
As GPU's are dedicated to floating point data, they don't have the space to do this kind of logic.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
On a more serious note, they shouldn't have elevated their monitor. Generally, the top of computer monitors should be at the eye-level of the user. From the picture, it doesn't seem so. This would cause more strain on the user's neck as he/she needs to look up more than looking down.
The way our neck bones are structured, makes looking up more strenuous than looking down. Hence, it is more comfortable to look downwards than upwards.
w00t
have you built many computers? with all of the systems i've put together, bundling wires is the most reasonable way to make the hardware manageable. additionally, the use of aftermarket cooling devices is often essential when you are planning to run your hardware at maximum capacity 24/7. if the plan for the hardware is something that the manufacturer never envisioned (which is clearly the case), then you have to find appropriate solutions. in the future, you may want to look a little more closely before jumping straight into harsh criticism.
personally, i think their heat issue could be solved with 1/2" tubes and water.
> Check out my blog post Nightmare on Core Street ....
There, fixed that (IMNSHO) for you...
Oh I understand perfectly why there is a 'stir', and I also understand why they aren't anywhere close to replacing general purpose CPU's yet. Not even close. I never said it wasn't interesting use for the technology.
For tomography reconstructions one typically needs SVD (Singular Value Decomposition) which are of n^2 sizes. E.g. consider a reconstructable image of 128 x 128 = 16384 pixels total, this needs (temporary) processing matrices of size 16384 x 16384, which, as you can see, grows rather fast with even moderate reconstruction grids.
I'm no great coder, nor matemathician, but wouldn't a GPU be perfect for simulations such as this one?
I imagine there is a lot of vector calculations going on, which GPUs are very good at. Maybe the communication between memory and GPU would be too slow.
Someone with more knowledge, please share your insights!
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors!
Looking at the last sequence of three posts in this thread, it seems obvious whose blood is boiling. LOL
Most supercomputers are optimized to perform 32 or 64 bit floating point operations. Currently broaching the petaflop boundary. The reason is that most established uses of supercomputers rely on mathematical computation: Fourier Transforms, Linear Algebra, etc., which utilize floating point operations. Visualization and imaging have been done on GP supercomputers, but not very economically. There's constant buzz in the HPC community about FPGAs and GPUs, but CPUs remain the most flexible hardware to do the core job of HPC which is floating point computation.
I wouldn't be surprised to see HPC centers adding (as budgets permit) GPU based solutions for vizualization, imaging, ray tracing, etc. FPGA solutions appear, IMHO, to be a bit farther away from use. The biggest hold up for both will be the programming interfaces. HPC programming interfaces are just emerging for GPU solutions. It remains to be seen which will succeed.
Invenio via vel creo
Actually, that's PCIe 2.0. They should be getting double that bandwidth.
A lot of graphics work is considered to be "embarrassingly parallel", such that gigabit ethernet is more than adequate as an interconnect. This is what movie studios use for their render farms. Given that they've managed to parallelize this to the array of stream processors on the GPUs, it sounds like they're at least approaching the "embarrassingly parallel" level. Whether or not they're so parallel they can use commodity ethernet as an interconnect... that's the question. From a programming perspective, running on a single system is certainly easier, but what if someone wanted to scale this up by a factor of 1000? Would this problem scale like that, or would they be limited to a smaller, less interesting set of problems?
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
...But will it run Vista?
I do some intense video editing and the 'rendering' phase takes forever. I have a dual core AMD, fast FSB...etc.. still slow. But wouldn't it make sense to have a GPU (or 2 or ..) just for those sort of number crunches ?? I would imagine rendering can be parallel processed..?
I would think there would be a market for an add-in board that can crunch numbers like crazy and would work with the common video software (Vegas, Premiere, etc.) Maybe even have a direct socket for SATA drives so as to avoid the bus.
Home video editing has come down in price so much that I see a huge future market there.
Comparing somthing to an old supercomputer is stupid. At any given time a s supercomputer is faster order of magnitude machines (60 to 600 teraflops in mid-2008).
Emacs used to be really big. Now it's pretty small, even with the graphics interfaces and kitchen sink written in ELisp. My browser's currently burning 300MB (the 3 Beta 5 version used much less RAM than 2.x, and I bought more RAM about the same time the RC version came out, so I don't know how much of the change is which :-) Of course, any time you take a 5-year-old application and run it on a current computer, it turns out to be small and blazingly fast, unless it has to play games with hardware emulation that can make it as slow as the original.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
If you look at most of the Top500 supercomputers these days, they're not Fluorinert-cooled Cray2s any more - they're almost all lots of processors with some kinds of interconnection network to get data between processors, and they're running lots of SIMD to make LINPAK go fast. This machine is using the CPU interconnects because they benchmarked it and found they didn't need to mess with SLI, and it may not have quite the I/O scalability that the Earth Simulator or its faster competitors have, but it's in the same kind of space. It's not near the top, but it's certainly respectable.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks