Slashdot Mirror


AMD Demonstrates "Teraflop In a Box"

UncleFluffy writes "AMD gave a sneak preview of their upcoming R600 GPU. The demo system was a single PC with two R600 cards running streaming computing tasks at just over 1 Teraflop. Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years." Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

20 of 182 comments (clear)

  1. ubiquitous by Speare · · Score: 4, Insightful

    Look up 'ubiquitous' before you whine about how far behind Intel might seem to be.

    Though having one demonstration will help spur the demand, and the demand will spur production, I still think it'll be five years before everybody's grandmother will have a Tf lying around on their checkbook-balancing credenza, and every PHB will have one under their desk warming their feet during long conference calls.

    --
    [ .sig file not found ]
  2. Re:Not misleading at all by sumdumass · · Score: 4, Interesting

    Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?

    Maybe soon but I thought it isn't _now_!

  3. Re:1 Teraflop you say? by minginqunt · · Score: 5, Funny

    How much is that in BogoMIPS?

    That's TWELFTY BAJILLION BogoMIPS. Per fortnight.

  4. OOOoooo by fyngyrz · · Score: 5, Interesting
    it's hard to program such GPUs for anything other than graphics applications

    It might be hard, but then again, it might be worthwhile. For instance (I'm a ham radio operator) I ran into a sampling shortwave radio receiver the other day. Thing samples from the antenna at 60+ MHz, thereby producing a stream of 14-bit data that can resolve everything happening below 30 MHz, or in other words, the entire shortwave spectrum and longwave and so on basically down to DC.

    Now, a radio like this requires that the signal be processed; first you separate it from the rest, then you demodulate it, then you apply things like notch filters (or you can do that prior to demodulation, that's very nice) you build an automatic gain control to handle amplitude swings, provide a way to vary the bandwidth and move the filter skirts (low and high) independently... you might like to produce a "panadapter" display of the spectrum around the signal of interest where the is a graph that lays out signal strengths for a defined distance up and down spectrum... you might want to demodulate more than one signal at once (say, a FAX transmission into a map on the one hand, and a voice transmission of the weather on the other.) And so on - I could really go on for a while.

    The thing is, as with all signal processing, the more you try to do with a real-time signal, the more resources you have to dedicate. And this isn't audio, or at least, not at the early stages; a 60+ MHz stream of data requires quite a bit more in terms of how fast you have to do things to it than does an audio stream at, say, 44 KHz.

    Bit signal processing typically uses fairly simple math; a lot of it, but you can do a lot without having to resort to real craziness. A teraflop of processing that isn't even happening on the CPU is pretty attractive. You'd have to get the data to it, and I'm thinking that would be pretty resource intensive, but between the main CPU and the GPU you should have enough "ooomph" left over to make a beautiful and functional radio interface.

    There is an interesting set of tasks in the signal processing space; forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging) requires lots and lots of signal processing. Be a kick to have it in a relatively standard box, with easily replaceable components. Maybe you could do the same thing above-ground; after all, it's still sound and there are still reflections that can tell you a lot (just observe a bat.)

    The cool thing about signal processing is that a lot of it is like graphics, in a way; generally, you set up some horrible sequence of things to do to your data, and then thrash each sample just like you did the last one.

    Anyway, it just struck me that no matter how hard it is to program, it could certainly be useful for some of these really resource intensive tasks.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:OOOoooo by sitturat · · Score: 4, Insightful

      Or you could just use the correct tool for the job - a DSP. I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

    2. Re:OOOoooo by fyngyrz · · Score: 4, Insightful
      I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

      Simple: they aren't available. PC's don't typically come with DSPs. But they do come with graphics, and if you can use the GPU for things like this, it's a nice dovetail. For someone like that radio manufacturer, no need to force the consumer to buy more hardware. It's already there.

      --
      I've fallen off your lawn, and I can't get up.
    3. Re:OOOoooo by maird · · Score: 4, Insightful

      A DSP probably is more efficient for that task but you can't go down to your local WalMart and buy one. Besides, even if you could, the IC isn't much use to anyone. Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous. Add to that the cost of getting any code written for it and the idea becomes suitable for military application only. OTOH, the PC has a huge and varied user base so it has the price consistent with being a mere commodity. It is general purpose and can be adapted to a large variety of tasks. It is relatively cheap to write code for and has a huge base of capable special interest programmers. If there is a 60+MHz ADC out there somewhere for a reasonable price then it isn't just a matter of whether a DSP is a better tool, a PC is a trivially cheap tool by comparison. You'd still need a decent UI to use an all-band direct sampling HF receiver. A PC would be good for that too, so keep it all in the same box. You can buy non-direct sampling receivers with DSPs in them at prices ranging from $1000 to exceeding $10000. The DSP is probably no faster than about 100kHz so the signal has to be passed through one or more analogue IF stages to get the signal you want into the 50kHz that can be decoded. You can probably buy a PC from with greater digital signal processing potential for less than $500. A 30MHz direct sampling receiver will receive and service 30MHz worth of bandwidth simultaneously. Not long after general availability, the graphics card configuration in question will probably cost less than $1000. With the processing capabilities it has you (the human) will probably run out of ability to interpret simultaneously decoded signals before the PC runs out of ability to decode more (it's really hard to listen to two conversations at the same time on an HF radio).

    4. Re:OOOoooo by End+Program · · Score: 5, Informative

      Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous.

      Maybe there aren't any DSP available and low cost, if you aren't a hardware designer:

      400 MHz DSP $10.00 http://www.analog.com/en/epProd/0,,ADSP-BF532,00.h tml
      14-bit, 65 MSPS ADC $30.00 http://www.analog.com/en/prod/0,,AD6644,00.html
      Catching non-designers talking smack ...priceless

  5. Re:Compatibility by level_headed_midwest · · Score: 4, Interesting

    The chips are a much different ISA, so there's no way that binaries that will run on G80 hardware will run on an R600. Heck, even the ATi R400 series (x700, x8x0) is not binary-compatible with the current R500 x1000 units.Maybe ATi will make a CUDA compiler, but I am guessing that since folks have already gotten going using the R500 hardware (see: http://folding.stanford.edu/ I doubt that AMD/ATi will make a big effort to use a competitor's technology. Please correct me if I am incorrect, but I am not aware of any groups or programs that use NVIDIA hardware as number-crunchers yet.

    --
    Just "gittin-r-done," day after day.
  6. The first rule of teraflop club... by Duncan3 · · Score: 4, Insightful

    Don't mention the wattage...

    And the second rule of teraflop club...

    Don't mention the wattage...

    Back here in the real world where we PAY FOR ELECTRICITY, we're waiting for some nice FLOPS/Watt, keep trying guys.

    And they announced this some time ago didn't they?

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:The first rule of teraflop club... by dlapine · · Score: 5, Informative
      LOL- you're complaining about wattage for 1 TF when they did it on a pair of friggin' video cards?? That's gotta be what, 500 watts total for whole PC?


      We've run several PC clusters and IBM mainframes that didn't have a 1TF of capacity. You don't want know much power went into them. Yes, our modern blade-based clusters are more condensed, but they're still power hogs for dual and quad core systems.

      Blue gene is considered to be a power efficient cluster and the fastest, but it still draws 7kw per rack of 1024 cpus. At 4.71 TF per rack, even Blue Gene pulls 1.5kw per teraflop.

      Yes, it's a pair of video cards, and not a general purpose cpu, but your average user doesn't have ability to program and use a Blue Gene style solution either. They just might get some real use out of this with a game Physics Engine that taps into this computing power.

      This is cool.

      --
      The Internet has no garbage collection
  7. Notpick by 91degrees · · Score: 4, Informative

    That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.

    1. Re:Notpick by 91degrees · · Score: 4, Funny

      Yup. It's the law. Any post pointing out an error must have at least one error itself.

  8. No, Ars didn't say why. Here's why. by Animats · · Score: 4, Informative

    Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

    No, Ars has an article blithering that it's hard to program such GPUs for anything other than graphics applications. It doesn't say anything constructive about why.

    Here's an reasonably readable tutorial on doing number-crunching in a GPU. The basic concepts are that "Arrays = textures", "Kernels = shaders", and "Computing = drawing". Yes, you do number-crunching by building "textures" and running shaders on them. If your problem can be expressed as parallel multiply-accumulate operations, which covers much classic supercomputer work, there's a good chance it can be done fast on a GPU. There's a broad class of problems that work well on a GPU, but they're generally limited to problems where the outputs from a step have little or no dependency on each other, allowing full parallelism of the computations of a single step. If your problem doesn't map well to that model, don't expect much.

  9. Step 2 by Saikik · · Score: 5, Funny

    Step 2: Don't leave your box in Boston.

  10. Re:Never thought of that by theantipop · · Score: 4, Informative

    http://folding.stanford.edu/FAQ-ATI.html

    It's still in beta AFAIK, but it has been in development for quite some time.

  11. Chip in a Box by natrius · · Score: 5, Funny

    To all the fellas out there with geek friends to impress
    It's easy to do, just follow these steps:
    One: Cut a hole in a box
    Two: Stick your chip in that box
    Three: Make her open the box
    And that's the way you do it
    It's my chip in a box

  12. Well...duh by Anonymous Coward · · Score: 5, Insightful

    GPGPU is hard because we're still in the very early days of this particular revolution. As I think about it, and from what we know of AMD's plans in particular, I think this is kind of like the evolution of FPU.

    See, in the early days FPU was a seperate chip (anyone remember buying an 80387 to plug into their mobo?). Writing code to use FPU was also a complete pain in the ass, because you had to use assembly, with all the memory management and interrupt handling headaches inherent. FPUs from different vendors weren't guaranteed to have completely compatible instruction sets. Because it was such a pain in the ass, only highly special purpose applications made use of FPU code. (And, it's not that computer scientists hadn't thought up appropriate abstractions to make writing floating point easy. Compilers just weren't spitting out FPU code).

    Then, things began to improve. The FPU was brought on die, but as an optional component (think 486SX vs 486DX). Languages evolved to support FPUs, hiding all the difficulty under suitible abstractions so programmer could write code that just worked. More applications began to make use of floating point capabilities, but very few required a FPU to work.

    Finally, FPU was brought on die as a bog standard part of the CPU. At that point, FPU capabilities could be taken for granted and an explosion of applications requiring an FPU to achieve decent performance ensued (see, for istance, most games). And writing FPU code is now no longer any more difficult than declaring type float. The compiler handles all the tricky parts.

    I think GPGPU will follow a similar trajectory. Right now, we're in phase one. Use a GPU for general purpose computation is such an incredible pain that only the most specialized applications are going to use GPGPU capabilities. High level languages haven't really evolved to take advantage of these capabilities yet. And yes, it's not as though computer scientists don't have appropriate abstractions that would make coding for GPGPU vastly easier. Eventually, GPGPU will become an optional part of the CPU. Eventually high level languages (in addition to the C family, perhaps FORTRAN or Matlab or other languages used in scientific computing) will be extended to use GPGPU capabilities. Standards will emerge, or where hardware manufacturers fail to standardize, high level abstraction will sweep the details under the rug. When this happens, many more applications will begin to take advantage of GPGPU capabilities. Even further down the road, GPGPU capabilities will become bog standard, at which point will see an explosion of applications that need these capabilities for decent performance.

    Granted, the curve for GPGPU is steeper because this isn't just a matter of different instructions, but a change in memory management as well. But I think this kind of transition can and will eventually happen.

  13. Re:Compatibility by UncleFluffy · · Score: 4, Informative

    Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

    From what I saw at the demo, the AMD stuff was running under Brook. As far as I've been able to make out from nVidia's documentation, CUDA is basically a derivative of Brook that has had a few syntax tweaks and some vendor-specific shiny things added to lock you in to nVidia hardware.

    --

    What would Lemmy do?

  14. Future plans by UnknowingFool · · Score: 4, Funny

    Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years."

    So I take it that AMD will be ready for Vista's successor?

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.