AMD Demonstrates "Teraflop In a Box"

← Back to Stories (view on slashdot.org)

AMD Demonstrates "Teraflop In a Box"

Posted by kdawson on Thursday March 1, 2007 @04:10AM from the speedy-silicon dept.

UncleFluffy writes "AMD gave a sneak preview of their upcoming R600 GPU. The demo system was a single PC with two R600 cards running streaming computing tasks at just over 1 Teraflop. Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years." Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

11 of 182 comments (clear)

Min score:

Reason:

Sort:

It isn't that they are hard to use for more... by Assmasher · 2007-03-01 04:35 · Score: 3, Informative

...generic purposes, it is that they're (GPUs) suited better for certain types of operations. Image processing, as an example, is very well suited to working on a GPU because the GPU excels at addressing and operating on elements of arrays (textures basically.) I've used it as a proof of concept at work for processing large numbers of video feeds simultaneously for things like photometric normalization, image stabilization, et cetera, and the things are awesome. They work well in this scenario because the problem I'm trying to solve fits the caveats of using the GPU well. Slow upload of data, miraculously fast action upon that data, slow download of the data. Now, slow is relative and getting more and more relative as new chipsets are released.

The actual framework for doing this is relatively simple although it certainly did help that I've a background in OpenGL and DirectXGraphics (so I've done shader work before); however, again, progress is removing those caveats as well. Generic GPU programming toolsets are imminent the only problem being ATI has no interest in their toolsets working with nVidia and nVidia has even less interest in their toolset(s) running ATI hardware. Something we'll just have to learn to deal with.

BTW, DirectX10 will make this a little easier as well with changes to how you have to pipeline data in order to operate on it in a particular fashion.

--
Loading...
Notpick by 91degrees · 2007-03-01 04:36 · Score: 4, Informative

That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.
Re:Never thought of that by Anonymous Coward · 2007-03-01 04:44 · Score: 3, Informative

Check out this web site: http://www.gpgpu.org/

It is up to date and contains a lot of related information.

WP
General Purpose Programmers by Doc+Ruby · 2007-03-01 04:47 · Score: 3, Informative

it's hard to program such GPUs for anything other than graphics applications.

"Anything other" is "general purpose", which they cover at GPGPU.org. But the general community of global developers hasn't gotten hooked on the cheap performance yet. Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.

--
--
make install -not war
No, Ars didn't say why. Here's why. by Animats · 2007-03-01 04:59 · Score: 4, Informative

Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.
No, Ars has an article blithering that it's hard to program such GPUs for anything other than graphics applications. It doesn't say anything constructive about why.
Here's an reasonably readable tutorial on doing number-crunching in a GPU. The basic concepts are that "Arrays = textures", "Kernels = shaders", and "Computing = drawing". Yes, you do number-crunching by building "textures" and running shaders on them. If your problem can be expressed as parallel multiply-accumulate operations, which covers much classic supercomputer work, there's a good chance it can be done fast on a GPU. There's a broad class of problems that work well on a GPU, but they're generally limited to problems where the outputs from a step have little or no dependency on each other, allowing full parallelism of the computations of a single step. If your problem doesn't map well to that model, don't expect much.
1. Re:No, Ars didn't say why. Here's why. by Chris+Ashton+84 · 2007-03-01 07:05 · Score: 3, Informative
  
  Yes, you used to have to do everything in a graphical environment, but not any more. With nVidia's CUDA you program in C/C++, have a general memory model (you can access texture memory if it's efficient for what you need, but you also have general device memory and several other types of memory to choose from) and run on fully capable stream processors. As far as the programmer is concerned, the gpu is just a stream processor add-in card. You do have to manually transfer to and from device memory, but once you have your data on the gpu you're free to access it however you want (arrays, textures, linear memory, whatever). It's not a difficult system to understand, though tuning your program for performance will be challenging. Check out http://developer.nvidia.com/object/cuda.html for more info.
Re:Never thought of that by theantipop · 2007-03-01 05:00 · Score: 4, Informative

http://folding.stanford.edu/FAQ-ATI.html

It's still in beta AFAIK, but it has been in development for quite some time.
Re:The first rule of teraflop club... by dlapine · 2007-03-01 05:02 · Score: 5, Informative

LOL- you're complaining about wattage for 1 TF when they did it on a pair of friggin' video cards?? That's gotta be what, 500 watts total for whole PC?

We've run several PC clusters and IBM mainframes that didn't have a 1TF of capacity. You don't want know much power went into them. Yes, our modern blade-based clusters are more condensed, but they're still power hogs for dual and quad core systems.
Blue gene is considered to be a power efficient cluster and the fastest, but it still draws 7kw per rack of 1024 cpus. At 4.71 TF per rack, even Blue Gene pulls 1.5kw per teraflop.
Yes, it's a pair of video cards, and not a general purpose cpu, but your average user doesn't have ability to program and use a Blue Gene style solution either. They just might get some real use out of this with a game Physics Engine that taps into this computing power.
This is cool.

--
The Internet has no garbage collection
Re:Step 1 by Anonymous Coward · 2007-03-01 05:47 · Score: 3, Informative

Step 1: Put your chip in the box. Dude. You have to cut a hole in the box first, otherwise you will pinch your junk...err...your chip under the lid.
Re:Compatibility by UncleFluffy · 2007-03-01 05:55 · Score: 4, Informative

Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

From what I saw at the demo, the AMD stuff was running under Brook. As far as I've been able to make out from nVidia's documentation, CUDA is basically a derivative of Brook that has had a few syntax tweaks and some vendor-specific shiny things added to lock you in to nVidia hardware.

--
What would Lemmy do?
Re:OOOoooo by End+Program · 2007-03-01 06:05 · Score: 5, Informative

Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous.

Maybe there aren't any DSP available and low cost, if you aren't a hardware designer:

400 MHz DSP $10.00 http://www.analog.com/en/epProd/0,,ADSP-BF532,00.h tml
14-bit, 65 MSPS ADC $30.00 http://www.analog.com/en/prod/0,,AD6644,00.html
Catching non-designers talking smack ...priceless