FASTRA II Puts 13 GPUs In a Desktop Supercomputer

← Back to Stories (view on slashdot.org)

FASTRA II Puts 13 GPUs In a Desktop Supercomputer

Posted by timothy on Wednesday December 16, 2009 @11:07AM from the lucky-number dept.

An anonymous reader writes "Last year tomography researchers of the ASTRA group at the University of Antwerp developed a desktop supercomputer with four NVIDIA GeForce 9800 GX2 graphics cards. The performance of the FASTRA GPGPU system was amazing; it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR. Today the researchers announce FASTRA II, a new 6000EUR GPGPU computing beast with six dual-GPU NVIDIA GeForce GTX 295 graphics cards and one GeForce GTX 275. The development of the new system was more complicated and there are still some stability issues, but tests reveal the 13 GPUs deliver 3.75x more performance than the old system. For the tomography reconstruction calculations these researchers need to do, the compact FASTRA II is four times faster than the university's supercomputer cluster, while being roughly 300 times more energy efficient."

31 of 127 comments (clear)

Awesome by enderjsv · 2009-12-16 11:13 · Score: 5, Funny

Almost meets the minimum requirements for Crysis 2
1. Re:Awesome by sadness203 · 2009-12-16 11:29 · Score: 4, Funny
  
  Only if you imagine a beowolf cluster of these
  Here goes the redundant and offtopic mod.
More Awesome by copponex · 2009-12-16 11:19 · Score: 3, Funny

This was post #2 and already modded -1, Redundant.
1. Re:More Awesome by joocemann · 2009-12-16 11:33 · Score: 3, Funny
  
  slashdot mods are often, as I observe, sour and pissy skeptics. even if it is humorous to them they will knock it for lack of something else to bash.
2. Re:More Awesome by joocemann · 2009-12-16 12:11 · Score: 2, Funny
  
  slashdot mods are often, as I observe, sour and pissy skeptics. even if it is humorous to them they will knock it for lack of something else to bash.
  -1 troll
  lol. exactly
Re:Easy money to be made? by Chirs · 2009-12-16 11:26 · Score: 4, Informative

Um...read the article?
The motherboard is a ASUS P6T7 WS Supercomputer.
News Flash by RandomUsr · 2009-12-16 11:29 · Score: 2, Funny

Blazing Fast Pron Machine running Windows Vista. Don't forget to pick a copy of the latest memory intensive Anti-Virus, as this machine will handle it just fine.
Re:Easy money to be made? by daVinci1980 · 2009-12-16 11:30 · Score: 3, Funny

You must be new here... ;)

--
I currently have no clever signature witicism to add here.
How fast is this really? by Ziekheid · 2009-12-16 11:31 · Score: 3, Insightful

"the compact FASTRA II is four times faster than the university's supercomputer cluster, while consuming 300 times less power" And the original supercomputer was how fast? 512 cores doesn't say THAT much. I could compare my computer to supercomputers from the past and they'd say the performance of my system was amazing too.
1. Re:How fast is this really? by jandrese · 2009-12-16 11:37 · Score: 5, Informative
  
  If you read the article it tells you that the supercomputer has 256 Opteron 250s (2.4Ghz) and was built 3 years ago. If you have a parallizable problem that can be solved with CUDA, you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.
  
  --
  
  I read the internet for the articles.
2. Re:How fast is this really? by Ziekheid · 2009-12-16 11:43 · Score: 2, Interesting
  
  I'll admit that, thanks for the info, you'd think this was crucial information for the summary too though. Everything put in perspective, it will only outperform the cluster on specific calculations so overall it's not faster right?
3. Re:How fast is this really? by raftpeople · 2009-12-16 11:55 · Score: 2, Interesting
  
  It's all a continuum and depends on the problem. For problems with enough parallelism that the GPU's are a good choice, then they are faster. For a completely serial problem, then the current fastest single core is faster than the both the supercomputer and the GPU's.
4. Re:How fast is this really? by jstults · 2009-12-16 12:08 · Score: 2, Informative
  
  you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.
  I had heard this from folks, but didn't really buy it until I read this paper today. They get a speed-up (wall clock) using the GPU even though they have to go to a worse algorithm (Jacobi instead of SSOR). Pretty amazing.
5. Re:How fast is this really? by cheesybagel · 2009-12-16 13:15 · Score: 2, Informative
  
  Really? Care to share any results that support that? I'm quite sure the peak flops you can achieve on the GPU are much higher than the limited SIMD capability of the CPU.
  IIRC they claim 2.5-3x times more performance using a Tesla than using the CPUs in their workstation. Ignoring load time.
  SSE enables a theoretical peak performance enhancement of 4x for SIMD amenable codes (e.g. you can do 4 parallel adds using vector SSE, in the time it takes to make 1 add using scalar SSE). In practice however you usually get like 3x more performance.
  Theoretical SIMD performance for the GPU is very fine and nice, but in practice the small caches in current GPUs limit performance. CPUs also often have out-of-order execution support and other hardware which is too expensive in terms of transistors to implement in a GPU.
  IMO the main problem here is that the programming model for the CPU is too complex since you need to use several different ways to express parallelism (SIMD/Multicore/Cluster) to get top performance.
times less by Tubal-Cain · 2009-12-16 11:37 · Score: 4, Funny

...consuming 300 times less power.
*sigh*
1. Re:times less by timeOday · 2009-12-16 17:02 · Score: 4, Insightful
  
  Can we please just officially define "n times less" as "1/n" and not feel bad about it anymore?
2. Re:times less by Nalgas+D.+Lemur · 2009-12-17 12:12 · Score: 2, Funny
  
  ...consuming 300 times less power. *sigh*
  Oops. Sorry. 300 times fewer.
Re:GPU accuracy by kpesler · 2009-12-16 11:48 · Score: 5, Informative

Presently the G200 GPUs in this machine support double-precision, but at about 1/8 the peak rate of single-precision. In practice, since most codes tend to be bandwidth limited, and pointer arithmetic is the same for single and double precision, double-precision performance is usually closer to 1/2 that of single-precision performance, but not always. With the Fermi GPUs to be released early next year, double-precision peak FLOPS will be 1/2 of single-precision peak, just like on present X86 processors. Also note that many scientific research groups, such as my own, have found that contrary to dogma, single-precision is good enough for most of the computation, and that a judicious mix of single and double-precision arithmetic gives high-performance with sufficient accuracy. This is true for some, but not all, computational methods.
Not sure how fast it is, but I know it is hot... by (H)elix1 · 2009-12-16 11:50 · Score: 2, Interesting

I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.
For those not familiar with the 9800gx2 cards, it essentially is two 8800gts video cards linked together to act as a single card - something called SLI on the NVidia side of marketing. SLI typically required a mainboard/chipset that would allow you to plug in two cards and link them together. This model allowed any mainboard to have two 'internal' cards linked together, with the option of linking another 9800gx2 if your board actually supported SLI.
The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.

--
+++ UGUCAUCGUAUUUCU
Stability problem solved... by gsgriffin · 2009-12-16 11:59 · Score: 2, Funny

Duh! Look at the number of GPU's...13...try 12 or 14 and your luck will change.

--
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
Re:GPU accuracy by Beardo+the+Bearded · 2009-12-16 12:00 · Score: 3, Interesting

First, a gaming card is going to get fast firmware. A workstation card is going to get accurate firmware. I imagine that supercomputer cards would get specialized firmware. (I only skimmed the summary.)
GPUs are excellent at solving certain types of problems and excel at solving matrices. (That's what your video card is doing while it's rendering.) The best part of that is that most, if not all, mathematical problems can be expressed as a matrix, meaning that your super-fast GPU can solve most math problems super-fast.
Next, GPUs love working together since they don't care about what the OS is doing. All they do is take raw data and respond with an answer. Usually we're putting that answer onto the display, since otherwise wtf are we doing with a GPU? In this case, the results are returned instead of using the flashy display. So what you end up with is a set of really fast, specialized, parallel engines solving broken down matrices.
They're also not subject to the marketing whims of Moore's Law, so you can often get faster cards sooner than faster CPUs. To break down a supercomputer so that you get this kind of performance for 4000 EURO is a fantastic achievement. It's almost, but not quite, hobby range. (I'd still put money on someone trying to evolve this into a gaming rig...)

--

---
ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
Re:Silly by modemboy · 2009-12-16 12:22 · Score: 2, Informative

The difference between GeForce and Quadro cards is almost always completely driver based, it is the exact same hw, different sw.
This basically a roll your own Tesla, and considering the Teslas connect to the host system via an 8x or 16x PCI-e add in card, I'm gonna say you are wrong when it comes to the bandwidth issue as well...
That's why I have a problem with the comparisons by Sycraft-fu · 2009-12-16 13:08 · Score: 3, Informative

Because it only applies to the kind of problems that CUDA is good at solving. Now while there are plenty of those, there are plenty that it isn't good for. Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.
That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.
Re:Easy money to be made? by nurb432 · 2009-12-16 13:10 · Score: 2, Funny

Does that come in a picoATX version?

--
---- Booth was a patriot ----
Re:Silly by jpmorgan · 2009-12-16 13:16 · Score: 2, Informative

The hardware is the same, but the quality control is different. Teslas and Quadros are held to rigorous standards. GeForces have an acceptable error rate. That's fine for gaming, but falls flat in scientific computing.
Re:Not sure how fast it is, but I know it is hot.. by thedarknite · 2009-12-16 13:21 · Score: 2, Funny

I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.
That's a brilliant idea, now people can make snacks without ever leaving the computer.

--
A game has objectives and is competitive, anything else is just play
Re:Silly by CityZen · 2009-12-16 13:36 · Score: 2, Insightful

It's not silly: (1) this is a research project, not production medical equipment, meaning that the funds to buy Tesla cards were probably not available, and they aren't particularly worried about occasional bit errors. (2) Their particular application doesn't need much inter-GPU communication, if any, so that bandwidth is not an issue. They just need for each GPU to load datasets, chew on them, and spit out the results.
How much does your proposed GPU supercomputer cost for 13 GPUs?
Why it's 13, not 14 GPUs by CityZen · 2009-12-16 13:49 · Score: 2, Interesting

Apparently, the regular BIOS can't boot with more than 5? graphics cards installed due to the amount of resources (memory & I/O space) that each one requires. So the researchers asked ASUS to make a special BIOS for them which doesn't set up the graphics card resources. However, the BIOS still needs to initialize at least one video card, so they agreed that the boot video card would be the one with only a single GPU. Presumably, they could have also chosen a dual GPU card that happened to be different from the others in some way.
Re:GPU accuracy by hairyfeet · 2009-12-16 16:40 · Score: 2, Interesting

Question: Since you seem to be pretty knowledgeable on the subject, have you or any of your colleagues used or tried the AMD Stream SDK? Because those ATi 5870s look to be pretty scary as far as raw power, and since the AMD SDK supports OpenCL on both the CPU and GPU, and AMD has opened up their code as well as supporting both Windows and Linux 32/64 bit I was just curious if you or anyone else here has tried it?

--
ACs don't waste your time replying, your posts are never seen by me.
Re:That's why I have a problem with the comparison by timeOday · 2009-12-16 16:59 · Score: 4, Insightful

Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.
That was always true of supercomputers. In fact the stuff that runs well on CUDA now is almost precisely the same stuff that ran well on Cray vector machines - the classic stereotype of "Supercomputer"! Thus I do not see your point. The best computer for any particular task will always be one specialized for that task, and thus compromised for other tasks.
BTW, newer GPUs support double precision.
Re:GPU accuracy by kpesler · 2009-12-17 06:39 · Score: 2, Interesting

I have not tried it for two reasons. First, to my knowledge there are no large public machines in the US being planned using AMD GPUs, so there is relatively little incentive to port the code to OpenCL. We run on large clusters and it appears for the moment that NVIDIA has the HPC cluster market tied up. Second, while OpenCL is quite similar to CUDA in many respects, it's also significantly less convenient from a coding perspective. NVIDIA added a few language extensions that makes launching kernels nearly as simple as a function call. As a pure C library, OpenCL requires much more setup code for each kernel invocation. If there was a strong incentive, such as the construction of a large NSF or DOE machine with AMD GPUs, I'd probably port it anyway, but without such a machine, it's not worth the time and effort. It's important to note that on GPUs, peak performance data often doesn't translate into actual performance numbers. The 4870 had a higher peak floating point rate than the G200, but in graphics and some other benchmarks, the G200 usually came out ahead. I don't know if this will also be the case with Fermi vs. 5870's. Finally, another large consideration is that AMD is pretty far behind on the software end. Besides mature compilers for both CUDA and OpenCL, NVIDIA provides profilers and debuggers that can debug GPU execution in hardware, and there is a growing ecosystem of CUDA libraries. For the sake of competition, I hope AMD adoption grows, but I've gotten the impression they are just not investing that much in general-purpose GPU computing.