FASTRA II Puts 13 GPUs In a Desktop Supercomputer
An anonymous reader writes "Last year tomography researchers of the ASTRA group at the University of Antwerp developed a desktop supercomputer with four NVIDIA GeForce 9800 GX2 graphics cards. The performance of the FASTRA GPGPU system was amazing; it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR. Today the researchers announce FASTRA II, a new 6000EUR GPGPU computing beast with six dual-GPU NVIDIA GeForce GTX 295 graphics cards and one GeForce GTX 275. The development of the new system was more complicated and there are still some stability issues, but tests reveal the 13 GPUs deliver 3.75x more performance than the old system. For the tomography reconstruction calculations these researchers need to do, the compact FASTRA II is four times faster than the university's supercomputer cluster, while being roughly 300 times more energy efficient."
Almost meets the minimum requirements for Crysis 2
This was post #2 and already modded -1, Redundant.
Um...read the article?
The motherboard is a ASUS P6T7 WS Supercomputer.
Blazing Fast Pron Machine running Windows Vista. Don't forget to pick a copy of the latest memory intensive Anti-Virus, as this machine will handle it just fine.
You must be new here... ;)
I currently have no clever signature witicism to add here.
"the compact FASTRA II is four times faster than the university's supercomputer cluster, while consuming 300 times less power" And the original supercomputer was how fast? 512 cores doesn't say THAT much. I could compare my computer to supercomputers from the past and they'd say the performance of my system was amazing too.
...consuming 300 times less power.
*sigh*
Presently the G200 GPUs in this machine support double-precision, but at about 1/8 the peak rate of single-precision. In practice, since most codes tend to be bandwidth limited, and pointer arithmetic is the same for single and double precision, double-precision performance is usually closer to 1/2 that of single-precision performance, but not always. With the Fermi GPUs to be released early next year, double-precision peak FLOPS will be 1/2 of single-precision peak, just like on present X86 processors. Also note that many scientific research groups, such as my own, have found that contrary to dogma, single-precision is good enough for most of the computation, and that a judicious mix of single and double-precision arithmetic gives high-performance with sufficient accuracy. This is true for some, but not all, computational methods.
I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.
For those not familiar with the 9800gx2 cards, it essentially is two 8800gts video cards linked together to act as a single card - something called SLI on the NVidia side of marketing. SLI typically required a mainboard/chipset that would allow you to plug in two cards and link them together. This model allowed any mainboard to have two 'internal' cards linked together, with the option of linking another 9800gx2 if your board actually supported SLI.
The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.
+++ UGUCAUCGUAUUUCU
Duh! Look at the number of GPU's...13...try 12 or 14 and your luck will change.
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
First, a gaming card is going to get fast firmware. A workstation card is going to get accurate firmware. I imagine that supercomputer cards would get specialized firmware. (I only skimmed the summary.)
GPUs are excellent at solving certain types of problems and excel at solving matrices. (That's what your video card is doing while it's rendering.) The best part of that is that most, if not all, mathematical problems can be expressed as a matrix, meaning that your super-fast GPU can solve most math problems super-fast.
Next, GPUs love working together since they don't care about what the OS is doing. All they do is take raw data and respond with an answer. Usually we're putting that answer onto the display, since otherwise wtf are we doing with a GPU? In this case, the results are returned instead of using the flashy display. So what you end up with is a set of really fast, specialized, parallel engines solving broken down matrices.
They're also not subject to the marketing whims of Moore's Law, so you can often get faster cards sooner than faster CPUs. To break down a supercomputer so that you get this kind of performance for 4000 EURO is a fantastic achievement. It's almost, but not quite, hobby range. (I'd still put money on someone trying to evolve this into a gaming rig...)
---
ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
The difference between GeForce and Quadro cards is almost always completely driver based, it is the exact same hw, different sw.
This basically a roll your own Tesla, and considering the Teslas connect to the host system via an 8x or 16x PCI-e add in card, I'm gonna say you are wrong when it comes to the bandwidth issue as well...
Because it only applies to the kind of problems that CUDA is good at solving. Now while there are plenty of those, there are plenty that it isn't good for. Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.
That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.
Does that come in a picoATX version?
---- Booth was a patriot ----
The hardware is the same, but the quality control is different. Teslas and Quadros are held to rigorous standards. GeForces have an acceptable error rate. That's fine for gaming, but falls flat in scientific computing.
That's a brilliant idea, now people can make snacks without ever leaving the computer.
A game has objectives and is competitive, anything else is just play
It's not silly: (1) this is a research project, not production medical equipment, meaning that the funds to buy Tesla cards were probably not available, and they aren't particularly worried about occasional bit errors. (2) Their particular application doesn't need much inter-GPU communication, if any, so that bandwidth is not an issue. They just need for each GPU to load datasets, chew on them, and spit out the results.
How much does your proposed GPU supercomputer cost for 13 GPUs?
Apparently, the regular BIOS can't boot with more than 5? graphics cards installed due to the amount of resources (memory & I/O space) that each one requires. So the researchers asked ASUS to make a special BIOS for them which doesn't set up the graphics card resources. However, the BIOS still needs to initialize at least one video card, so they agreed that the boot video card would be the one with only a single GPU. Presumably, they could have also chosen a dual GPU card that happened to be different from the others in some way.
Question: Since you seem to be pretty knowledgeable on the subject, have you or any of your colleagues used or tried the AMD Stream SDK? Because those ATi 5870s look to be pretty scary as far as raw power, and since the AMD SDK supports OpenCL on both the CPU and GPU, and AMD has opened up their code as well as supporting both Windows and Linux 32/64 bit I was just curious if you or anyone else here has tried it?
ACs don't waste your time replying, your posts are never seen by me.
That was always true of supercomputers. In fact the stuff that runs well on CUDA now is almost precisely the same stuff that ran well on Cray vector machines - the classic stereotype of "Supercomputer"! Thus I do not see your point. The best computer for any particular task will always be one specialized for that task, and thus compromised for other tasks.
BTW, newer GPUs support double precision.
I have not tried it for two reasons. First, to my knowledge there are no large public machines in the US being planned using AMD GPUs, so there is relatively little incentive to port the code to OpenCL. We run on large clusters and it appears for the moment that NVIDIA has the HPC cluster market tied up. Second, while OpenCL is quite similar to CUDA in many respects, it's also significantly less convenient from a coding perspective. NVIDIA added a few language extensions that makes launching kernels nearly as simple as a function call. As a pure C library, OpenCL requires much more setup code for each kernel invocation. If there was a strong incentive, such as the construction of a large NSF or DOE machine with AMD GPUs, I'd probably port it anyway, but without such a machine, it's not worth the time and effort. It's important to note that on GPUs, peak performance data often doesn't translate into actual performance numbers. The 4870 had a higher peak floating point rate than the G200, but in graphics and some other benchmarks, the G200 usually came out ahead. I don't know if this will also be the case with Fermi vs. 5870's. Finally, another large consideration is that AMD is pretty far behind on the software end. Besides mature compilers for both CUDA and OpenCL, NVIDIA provides profilers and debuggers that can debug GPU execution in hardware, and there is a growing ecosystem of CUDA libraries. For the sake of competition, I hope AMD adoption grows, but I've gotten the impression they are just not investing that much in general-purpose GPU computing.