AMD Demonstrates "Teraflop In a Box"
UncleFluffy writes "AMD gave a sneak preview of their upcoming R600 GPU. The demo system was a single PC with two R600 cards running streaming computing tasks at just over 1 Teraflop. Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years." Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.
...generic purposes, it is that they're (GPUs) suited better for certain types of operations. Image processing, as an example, is very well suited to working on a GPU because the GPU excels at addressing and operating on elements of arrays (textures basically.) I've used it as a proof of concept at work for processing large numbers of video feeds simultaneously for things like photometric normalization, image stabilization, et cetera, and the things are awesome. They work well in this scenario because the problem I'm trying to solve fits the caveats of using the GPU well. Slow upload of data, miraculously fast action upon that data, slow download of the data. Now, slow is relative and getting more and more relative as new chipsets are released.
The actual framework for doing this is relatively simple although it certainly did help that I've a background in OpenGL and DirectXGraphics (so I've done shader work before); however, again, progress is removing those caveats as well. Generic GPU programming toolsets are imminent the only problem being ATI has no interest in their toolsets working with nVidia and nVidia has even less interest in their toolset(s) running ATI hardware. Something we'll just have to learn to deal with.
BTW, DirectX10 will make this a little easier as well with changes to how you have to pipeline data in order to operate on it in a particular fashion.
Loading...
That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.
Check out this web site: http://www.gpgpu.org/
It is up to date and contains a lot of related information.
WP
"Anything other" is "general purpose", which they cover at GPGPU.org. But the general community of global developers hasn't gotten hooked on the cheap performance yet. Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.
--
make install -not war
Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.
No, Ars has an article blithering that it's hard to program such GPUs for anything other than graphics applications. It doesn't say anything constructive about why.
Here's an reasonably readable tutorial on doing number-crunching in a GPU. The basic concepts are that "Arrays = textures", "Kernels = shaders", and "Computing = drawing". Yes, you do number-crunching by building "textures" and running shaders on them. If your problem can be expressed as parallel multiply-accumulate operations, which covers much classic supercomputer work, there's a good chance it can be done fast on a GPU. There's a broad class of problems that work well on a GPU, but they're generally limited to problems where the outputs from a step have little or no dependency on each other, allowing full parallelism of the computations of a single step. If your problem doesn't map well to that model, don't expect much.
http://folding.stanford.edu/FAQ-ATI.html
It's still in beta AFAIK, but it has been in development for quite some time.
We've run several PC clusters and IBM mainframes that didn't have a 1TF of capacity. You don't want know much power went into them. Yes, our modern blade-based clusters are more condensed, but they're still power hogs for dual and quad core systems.
Blue gene is considered to be a power efficient cluster and the fastest, but it still draws 7kw per rack of 1024 cpus. At 4.71 TF per rack, even Blue Gene pulls 1.5kw per teraflop.
Yes, it's a pair of video cards, and not a general purpose cpu, but your average user doesn't have ability to program and use a Blue Gene style solution either. They just might get some real use out of this with a game Physics Engine that taps into this computing power.
This is cool.
The Internet has no garbage collection
The Playstation 3 is reported to harness 2 TFLOPS. But "only" 204GFLOPS run on the Cell CPU, 10%. The other 1.8TFLOPS runs on the nVidia G70 GPU. But the G70 runs shaders, very limited application to anything but actually rendering graphics.
The Cell itself is notoriously hard to code for. If just some extra effort can target the nVidia, that's TWO TeraFLOPS in a $500 box. A huge leap past both AMD and Intel.
--
make install -not war
hat must be why nVidia has decided to enter the x86 chip market and Intel has significantly improved their GPU offerings, as well as indicate they may include vector units in future chips, because these companies plan to work together in the future! It's so obvious! I wish I hadn't paid attention these past 6 months, as it's clearly confused me!
Sarcasm suits you well.
While Intel and nVidia may both be independently reinventing the wheel right now, neither seems to be getting very far very fast. Intel's video offerings have been poor at best and no one has seen an nVidia x86 processor. AMD has already demo'd a prototype, which means they are further along with this Fusion than both Intel and nVidia combined. I don't think it will take long for the decision makers at both of these companies to realize that the other has the missing component.
Of course, you could be right. This is pure speculation on my part and I am pretty much talking from my ass. Still, the idea makes perfect sense to me.
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.
From what I saw at the demo, the AMD stuff was running under Brook. As far as I've been able to make out from nVidia's documentation, CUDA is basically a derivative of Brook that has had a few syntax tweaks and some vendor-specific shiny things added to lock you in to nVidia hardware.
What would Lemmy do?
Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous.
h tml
...priceless
Maybe there aren't any DSP available and low cost, if you aren't a hardware designer:
400 MHz DSP $10.00 http://www.analog.com/en/epProd/0,,ADSP-BF532,00.
14-bit, 65 MSPS ADC $30.00 http://www.analog.com/en/prod/0,,AD6644,00.html
Catching non-designers talking smack
Count real, usable FLOPS. GPU's don't win.
But for ~$500, it's what's going to be used.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
CUDA isn't a derivative of Brook, it's a more general programming model. Whereas brook is a streaming architecture, meaning that each iteration of the kernel writes one value at the end, the threads in CUDA are able to write many values, as well as perform some communication during the processing.
This new capability will enable CUDA will enable more general algorithms.