Slashdot Mirror


AMD Demonstrates "Teraflop In a Box"

UncleFluffy writes "AMD gave a sneak preview of their upcoming R600 GPU. The demo system was a single PC with two R600 cards running streaming computing tasks at just over 1 Teraflop. Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years." Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

182 comments

  1. well, it shouldn't be by jimstapleton · · Score: 3, Funny

    It shouldn't be a TERAble FLOP at the stores anyway. Nice performance...

    OK, yes, bad pun, bad spelling, you can "-1 get a real sense of humor" me now.

    --
    34486853790
    Connection too slow for X forwarding? Try "ssh -CX user@host"
    1. Re:well, it shouldn't be by Sneakernets · · Score: 1

      What about a Microflop in a box? http://www.youtube.com/watch?v=1dmVU08zVpA

      --
      "No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
    2. Re:well, it shouldn't be by TeknoHog · · Score: 0

      "-1, bad spelling" for your "-1, innane"

      --
      Escher was the first MC and Giger invented the HR department.
    3. Re:well, it shouldn't be by ebers · · Score: 1

      > "-1, bad spelling" for your "-1, innane"
      "-1 nitpicking".

  2. Compatibility by mrchaotica · · Score: 1

    Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

    --

    "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    1. Re:Compatibility by level_headed_midwest · · Score: 4, Interesting

      The chips are a much different ISA, so there's no way that binaries that will run on G80 hardware will run on an R600. Heck, even the ATi R400 series (x700, x8x0) is not binary-compatible with the current R500 x1000 units.Maybe ATi will make a CUDA compiler, but I am guessing that since folks have already gotten going using the R500 hardware (see: http://folding.stanford.edu/ I doubt that AMD/ATi will make a big effort to use a competitor's technology. Please correct me if I am incorrect, but I am not aware of any groups or programs that use NVIDIA hardware as number-crunchers yet.

      --
      Just "gittin-r-done," day after day.
    2. Re:Compatibility by MrHanky · · Score: 1

      That seems likely, but it should be possible to make an API like OpenGL for more general processing as well, shouldn't it? Then all you need is a driver, and your code won't be obsolete every time a new generation GPU comes out.

    3. Re:Compatibility by UncleFluffy · · Score: 4, Informative

      Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

      From what I saw at the demo, the AMD stuff was running under Brook. As far as I've been able to make out from nVidia's documentation, CUDA is basically a derivative of Brook that has had a few syntax tweaks and some vendor-specific shiny things added to lock you in to nVidia hardware.

      --

      What would Lemmy do?

    4. Re:Compatibility by mrchaotica · · Score: 1

      Hey, thanks -- I was wondering if something like that existed! I'm actually about to start working on a computer vision-related research project that might be well-suited to running on a GPU, and was trying to figure out what technology to use to write it. I think Brook might be it.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    5. Re:Compatibility by UncleFluffy · · Score: 1

      No problem. I'd advise you to grab the latest CVS and look in the forums for any required build tweaks - the tarball on sourceforge often lags by quite a bit.

      --

      What would Lemmy do?

    6. Re:Compatibility by Anonymous Coward · · Score: 1, Informative

      CUDA isn't a derivative of Brook, it's a more general programming model. Whereas brook is a streaming architecture, meaning that each iteration of the kernel writes one value at the end, the threads in CUDA are able to write many values, as well as perform some communication during the processing.

      This new capability will enable CUDA will enable more general algorithms.

  3. ubiquitous by Speare · · Score: 4, Insightful

    Look up 'ubiquitous' before you whine about how far behind Intel might seem to be.

    Though having one demonstration will help spur the demand, and the demand will spur production, I still think it'll be five years before everybody's grandmother will have a Tf lying around on their checkbook-balancing credenza, and every PHB will have one under their desk warming their feet during long conference calls.

    --
    [ .sig file not found ]
    1. Re:ubiquitous by UncleFluffy · · Score: 1

      Look up 'ubiquitous' before you whine about how far behind Intel might seem to be.

      Sorry, late night submission. I'll claim an error of verb tense rather than adjective usage: "this will beat" rather than "this beats". This silicon is shipping high-end in a couple of weeks, so it'll be mid-range this time next year and integrated on the motherboard the year after that (or thereabouts). Another year or two for the regular PC replacement cycle to churn that through, and it should be widespread by the time Intel predicted for shipping of their 80-core prototype.

      --

      What would Lemmy do?

    2. Re:ubiquitous by asadodetira · · Score: 1

      A working prototype is nice, but it's only viable if the can manufacture the chips with a high yield. New processes have low yield, meaning a high percentage of the chips don't work.

  4. Not misleading at all by minginqunt · · Score: 1

    Oh no.

    I mean, the PS3 does 2 Teraflops! OMG, they're like 20 years ahead of Intel, who are so RUBBISH.

    And what would be the theoretical floppage of, say, a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA? I'm willing to bet it would be somewhat higher than this setup.

    1. Re:Not misleading at all by sumdumass · · Score: 4, Interesting

      Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?

      Maybe soon but I thought it isn't _now_!

    2. Re:Not misleading at all by ArcherB · · Score: 1

      Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?

      Excellent point! Expect to see a nVidia/Intel partnership in 5, 4, 3, 2...

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    3. Re:Not misleading at all by BobPaul · · Score: 3, Insightful

      Excellent point! Expect to see a nVidia/Intel partnership in 5, 4, 3, 2... Good call! That must be why nVidia has decided to enter the x86 chip market and Intel has significantly improved their GPU offerings, as well as indicate they may include vector units in future chips, because these companies plan to work together in the future! It's so obvious! I wish I hadn't paid attention these past 6 months, as it's clearly confused me!
    4. Re:Not misleading at all by SP33doh · · Score: 1

      a dual 8800gtx configuration can indeed pull this off.

    5. Re:Not misleading at all by Anonymous Coward · · Score: 0

      Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time? Who says cannot? You don't have such a dastardly box because you chose not to have one.

      Maybe soon but I thought it isn't _now_! If you really wanted it, you could have it right now. Stop whining. It's your own decision if you don't have one now!
    6. Re:Not misleading at all by HappySqurriel · · Score: 2, Insightful

      Well, as I see it, advertizing "[some amazing benchmark] in a box" is reasonably foolish because I could produce a system with amazing theoritical performance that doesn't really perform that much better than a system that is a fraction of the cost ... It wasn't that long ago where you could (easily) buy motherboards that supported 2 or 4 seperate processors, and people have generated Quad-SLi setups; what this means is you could create a 4 processor Core 2 Duo system with a Quad SLi Geforce 8800 GTx which (in most applications) would not perform much better than a single processor Core 2 Duo system with a single Geforce 8800GTx.

    7. Re:Not misleading at all by Dread+Pirate+Skippy · · Score: 0, Offtopic

      Oh snap! =O

    8. Re:Not misleading at all by neverpsyked · · Score: 0

      I would have rated you "+1 Insightful" if it weren't for the complete lack of anything resembling proper English.

      --
      What if this weren't a hypothetical question?
    9. Re:Not misleading at all by ArcherB · · Score: 2, Informative

      hat must be why nVidia has decided to enter the x86 chip market and Intel has significantly improved their GPU offerings, as well as indicate they may include vector units in future chips, because these companies plan to work together in the future! It's so obvious! I wish I hadn't paid attention these past 6 months, as it's clearly confused me!

      Sarcasm suits you well.

      While Intel and nVidia may both be independently reinventing the wheel right now, neither seems to be getting very far very fast. Intel's video offerings have been poor at best and no one has seen an nVidia x86 processor. AMD has already demo'd a prototype, which means they are further along with this Fusion than both Intel and nVidia combined. I don't think it will take long for the decision makers at both of these companies to realize that the other has the missing component.

      Of course, you could be right. This is pure speculation on my part and I am pretty much talking from my ass. Still, the idea makes perfect sense to me.

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    10. Re:Not misleading at all by Anonymous Coward · · Score: 0

      lol.. I'm not the one wanting either. The first to do what can already be done means they were the first to do it. I dunno why this is so dificult. I mean if it was so easy and obvious, then everyone would already be doing it right?

      Give some credit were credit is due. It may very well be that an Intel Nvidia system could out do this. But no body has tryed it until This thing was shown off.

    11. Re:Not misleading at all by Anonymous Coward · · Score: 0

      I have a dual 8800 gts SLI'd with a Pentium Core 2 Duo (2.4gHz) I'm at work now, but when I get home I'm-a breakin' out the benchmark programs!

    12. Re:Not misleading at all by Anonymous Coward · · Score: 0

      GP poster's English was proper. What are you talking about?

  5. Step 1 by Anonymous Coward · · Score: 3, Funny

    Step 1: Put your chip in the box.

    1. Re:Step 1 by Veetox · · Score: 1

      ...Make Ars open the box, and that's the way you do it, BABY!

    2. Re:Step 1 by Anonymous Coward · · Score: 3, Informative

      Step 1: Put your chip in the box. Dude. You have to cut a hole in the box first, otherwise you will pinch your junk...err...your chip under the lid.

    3. Re:Step 1 by allacds · · Score: 1

      offtopic

      Haha I love how this is modded "informative"... /offtopic

    4. Re:Step 1 by Anonymous Coward · · Score: 0

      Since when is replying to a joke offtopic? If anything you should have said offtopic to the parent, but that would also be wrong because it was just a joke involving the hardware in question ;)

      Technically it was informative because it was correcting the parents wrong order.

  6. 1 Teraflop you say? by TheCreeep · · Score: 3, Funny

    How much is that in BogoMIPS?

    1. Re:1 Teraflop you say? by solevita · · Score: 1

      Or football pitch lengths?

    2. Re:1 Teraflop you say? by minginqunt · · Score: 5, Funny

      How much is that in BogoMIPS?

      That's TWELFTY BAJILLION BogoMIPS. Per fortnight.

    3. Re:1 Teraflop you say? by garcia · · Score: 1

      And it had to go uphill both ways! Fuck that's fast.

    4. Re:1 Teraflop you say? by clickclickdrone · · Score: 1

      >BogoMIPS?
      Does dual-core give you BOGOFMIPS?

      --
      I want a list of atrocities done in your name - Recoil
    5. Re:1 Teraflop you say? by hey! · · Score: 1

      That's TWELFTY BAJILLION BogoMIPS. Per fortnight.


      So "teraflop" is a unit of computational acceleration? Cool.
      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    6. Re:1 Teraflop you say? by Lemmeoutada+Collecti · · Score: 1

      Not only that, but being BogoMIPS instead of MIPS, it is a unit of acceleration on the imaginary plane (i.e. 90 degrees relative to reality), and is freely convertible to FPFPF (furlongs per fortnight per fortnight) but only on Pemtiums which have been treated for floating point bugs.

      --

      You can have it fast, accurate, or pretty. Pick any 2.
  7. Never thought of that by arlo5724 · · Score: 3, Interesting

    I might be (read: am mostly) retarded but I never thought of using a graphics processor for anything else, but with the super cards around the corner it makes sense that some normal processing jobs could be farmed out to the GPU when its not being occupied with graphics duties. Does anyone know where I can find some extra info on this, or to what extent this is being implemented? My curiosity is piqued!

    1. Re:Never thought of that by Anonymous Coward · · Score: 3, Informative

      Check out this web site: http://www.gpgpu.org/

      It is up to date and contains a lot of related information.

      WP

    2. Re:Never thought of that by clickclickdrone · · Score: 1

      Isn't there a version of Folding@Home that uses the GFX cores?

      --
      I want a list of atrocities done in your name - Recoil
    3. Re:Never thought of that by theantipop · · Score: 4, Informative

      http://folding.stanford.edu/FAQ-ATI.html

      It's still in beta AFAIK, but it has been in development for quite some time.

    4. Re:Never thought of that by Frozen+Void · · Score: 1
    5. Re:Never thought of that by Anonymous Coward · · Score: 0

      I think this is the first time I've seen 'piqued' spelled correctly on slashdot, which makes you less retarded than most here.

  8. Wow... by Howard+Beale · · Score: 0

    imagine a Beow....ah, screw it.

    1. Re:Wow... by tttonyyy · · Score: 2, Funny

      imagine a Beow....ah, screw it.
      Imagine a Beowulf cluster of organically connected people imagining Beowulf clusters - I'd have Quake running on you at a squillian FPS in no time!
      --
      biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!
    2. Re:Wow... by jimstapleton · · Score: 1

      Yes, but will it run Lin...

      Screw it, I prefer BSD anyway.

      --
      34486853790
      Connection too slow for X forwarding? Try "ssh -CX user@host"
  9. Two words by paintballer1087 · · Score: 0, Redundant

    Beowolf cluster.... I think that's all that needs said

  10. OOOoooo by fyngyrz · · Score: 5, Interesting
    it's hard to program such GPUs for anything other than graphics applications

    It might be hard, but then again, it might be worthwhile. For instance (I'm a ham radio operator) I ran into a sampling shortwave radio receiver the other day. Thing samples from the antenna at 60+ MHz, thereby producing a stream of 14-bit data that can resolve everything happening below 30 MHz, or in other words, the entire shortwave spectrum and longwave and so on basically down to DC.

    Now, a radio like this requires that the signal be processed; first you separate it from the rest, then you demodulate it, then you apply things like notch filters (or you can do that prior to demodulation, that's very nice) you build an automatic gain control to handle amplitude swings, provide a way to vary the bandwidth and move the filter skirts (low and high) independently... you might like to produce a "panadapter" display of the spectrum around the signal of interest where the is a graph that lays out signal strengths for a defined distance up and down spectrum... you might want to demodulate more than one signal at once (say, a FAX transmission into a map on the one hand, and a voice transmission of the weather on the other.) And so on - I could really go on for a while.

    The thing is, as with all signal processing, the more you try to do with a real-time signal, the more resources you have to dedicate. And this isn't audio, or at least, not at the early stages; a 60+ MHz stream of data requires quite a bit more in terms of how fast you have to do things to it than does an audio stream at, say, 44 KHz.

    Bit signal processing typically uses fairly simple math; a lot of it, but you can do a lot without having to resort to real craziness. A teraflop of processing that isn't even happening on the CPU is pretty attractive. You'd have to get the data to it, and I'm thinking that would be pretty resource intensive, but between the main CPU and the GPU you should have enough "ooomph" left over to make a beautiful and functional radio interface.

    There is an interesting set of tasks in the signal processing space; forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging) requires lots and lots of signal processing. Be a kick to have it in a relatively standard box, with easily replaceable components. Maybe you could do the same thing above-ground; after all, it's still sound and there are still reflections that can tell you a lot (just observe a bat.)

    The cool thing about signal processing is that a lot of it is like graphics, in a way; generally, you set up some horrible sequence of things to do to your data, and then thrash each sample just like you did the last one.

    Anyway, it just struck me that no matter how hard it is to program, it could certainly be useful for some of these really resource intensive tasks.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:OOOoooo by sitturat · · Score: 4, Insightful

      Or you could just use the correct tool for the job - a DSP. I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

    2. Re:OOOoooo by fyngyrz · · Score: 4, Insightful
      I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

      Simple: they aren't available. PC's don't typically come with DSPs. But they do come with graphics, and if you can use the GPU for things like this, it's a nice dovetail. For someone like that radio manufacturer, no need to force the consumer to buy more hardware. It's already there.

      --
      I've fallen off your lawn, and I can't get up.
    3. Re:OOOoooo by Jeff+DeMaagd · · Score: 1

      The graphics processor is basically a DSP now.

      We use computers to do things that it really isn't the best at doing, but we use the computer because it is so flexible at doing so many things and cheaply, wheras a DSP in a specialized box may be better for a specific single task, the economies of scale come into play.

    4. Re:OOOoooo by maird · · Score: 4, Insightful

      A DSP probably is more efficient for that task but you can't go down to your local WalMart and buy one. Besides, even if you could, the IC isn't much use to anyone. Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous. Add to that the cost of getting any code written for it and the idea becomes suitable for military application only. OTOH, the PC has a huge and varied user base so it has the price consistent with being a mere commodity. It is general purpose and can be adapted to a large variety of tasks. It is relatively cheap to write code for and has a huge base of capable special interest programmers. If there is a 60+MHz ADC out there somewhere for a reasonable price then it isn't just a matter of whether a DSP is a better tool, a PC is a trivially cheap tool by comparison. You'd still need a decent UI to use an all-band direct sampling HF receiver. A PC would be good for that too, so keep it all in the same box. You can buy non-direct sampling receivers with DSPs in them at prices ranging from $1000 to exceeding $10000. The DSP is probably no faster than about 100kHz so the signal has to be passed through one or more analogue IF stages to get the signal you want into the 50kHz that can be decoded. You can probably buy a PC from with greater digital signal processing potential for less than $500. A 30MHz direct sampling receiver will receive and service 30MHz worth of bandwidth simultaneously. Not long after general availability, the graphics card configuration in question will probably cost less than $1000. With the processing capabilities it has you (the human) will probably run out of ability to interpret simultaneously decoded signals before the PC runs out of ability to decode more (it's really hard to listen to two conversations at the same time on an HF radio).

    5. Re:OOOoooo by maxume · · Score: 1
      --
      Nerd rage is the funniest rage.
    6. Re:OOOoooo by try_anything · · Score: 2, Interesting
      You can buy a decent FPGA development board and turn it into a DSP for the price of a high-end graphics card. It isn't a trivial project to get started with, but it might be easier than using a GPU. Plus, the skills and hardware from this project will take you much farther than GPU skills.

      Get started here and find some example DSP cores here.

    7. Re:OOOoooo by fyngyrz · · Score: 1

      Yes, I have. Great pointers; thanks.

      --
      I've fallen off your lawn, and I can't get up.
    8. Re:OOOoooo by fyngyrz · · Score: 1

      If you were going to go to that kind of trouble, why not buy a chip (or entire board) designed to be a DSP? Why go the FPGA route? Not trying to be nasty, I assume you have a reason for suggesting this, I just don't know what it is.

      --
      I've fallen off your lawn, and I can't get up.
    9. Re:OOOoooo by try_anything · · Score: 1

      The original poster seems to want a lot of control and the possibility of tinkering with different configurations -- "Be a kick to have it in a relatively standard box, with easily replaceable components." Working with FPGAs gives you that software-like ability to create or download new components and rearrange them to fit your needs. A DSP board gives you one fixed layout of components. Plus, you can have fun turning the FPGA into anything else you want.

    10. Re:OOOoooo by compling · · Score: 1

      As others have commented, DSPs are not necessarily the most cost efficient option. At a previous job, I ended up writing two versions of the system we were developing: one for TIs newest, hottest DSP, pre-release version, and another version for the PC. Optimized the hell out of the DSP version, used every trick I knew.

      In the end, I had to conclude that a dual-cpu system, still cheaper than a DSP-based solution, would blow away the DSP in terms of performance. It was a bit of a shock to me at the time.

    11. Re:OOOoooo by End+Program · · Score: 5, Informative

      Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous.

      Maybe there aren't any DSP available and low cost, if you aren't a hardware designer:

      400 MHz DSP $10.00 http://www.analog.com/en/epProd/0,,ADSP-BF532,00.h tml
      14-bit, 65 MSPS ADC $30.00 http://www.analog.com/en/prod/0,,AD6644,00.html
      Catching non-designers talking smack ...priceless

    12. Re:OOOoooo by nrrd · · Score: 1

      I think underwater exploration is really interesting, but know almost nothing about it. I'm curious what you mean by "forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging)". Do you mean a full-on photographic quality image? Something like side-scan radar? Would you mind posting more? I'm not sure what you mean and I'd like to learn a little about this.

      --
      "Eye halve a spelling chequer, It came with my pea sea, It plainly marques four my revue, Miss steaks eye kin knot sea"
    13. Re:OOOoooo by julesh · · Score: 1

      You can buy a decent FPGA development board and turn it into a DSP for the price of a high-end graphics card. It isn't a trivial project to get started with, but it might be easier than using a GPU. Plus, the skills and hardware from this project will take you much farther than GPU skills.

      Really? I haven't seen PC-insertable FPGA dev boards that are capable of clocking anything like as high as a modern GPU (i.e. typically ~800MHz) for sub-$1000. If you can point me in the direction of a reasonably-priced FPGA dev board that I could implement a 500MHz system on, I'd really appreciate it, but so far I haven't seen one.

    14. Re:OOOoooo by Anonymous Coward · · Score: 0

      that's because they don't need to clock that high....

      anyone who writes any HDL at all will tell you that depending on the application, a 1Mhz FPGA may absolutely decimate any uProcessor running up into the hundreds of Mhz. This is because you have tons of processes running at the same time (not just spoofing it like an OS does with doing a bit of process A, bit of B, bit of C, in a cycle) and you can also have data buses that are hundreds of bits wide.

      Let me give you a quick example:
      i had a project once that was essentially an arbitrary waveform generator. So...i had the following processes:
      - UART that updates the DAC for the actual wave
      - handing new data to the UART above
      - calculating new data that gets handed to the UART (via lookup table interpolation, etc)
      - polling keypad for new keypresses
      - UART that hands these keypresses into the main core
      - display driver that writes characters to the LCD UART
      - LCD UART itself
      - display adapter that calculates the characters that get written out
      - several main 'core' processes.

      despite the fact that i was running off of a 1 meg clock, i could still have my waveforms come out at nearly 1 meg (minus DAC update time) because of the fact that i had seperate processes for everything.

    15. Re:OOOoooo by Frozen+Void · · Score: 1

      The most efficient approach to computing is to build software specific hardware.
      Its however the most time consuming and expensive.Thats why it a niche.
      Not everything justifies the cost.Like "deep blue",the brute force applied to a problem is not efficient in itself.
      You need to optimize the software first.

    16. Re:OOOoooo by PetiePooo · · Score: 2, Interesting

      I'm curious what you mean by "forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging)".

      I think he's talking about something more along the lines of what they're calling a 3D/4D ultrasound. That doesn't mean much unless you've recently had a child, so here's an example from GE (requires flash). For a non-flash example, just google for 4d ultrasound and try a few of the links...

      The images are not in color, and sometimes you lose detail as an elbow (think whale) gets too close to the transducer. But with more processing power and better transducers, kinks like that should go away...

    17. Re:OOOoooo by RangerElf · · Score: 1

      Damn I wish I had modpoints... -gus

    18. Re:OOOoooo by try_anything · · Score: 1

      The other response said it all, but here's another way of looking at it:

      For a processor, the minimum clock speed required is

      (rate of incoming data) * (# of instructions to process a unit of data) / (average number of instructions per clock cycle, aka IPC)

      For a nicely pipelined hardware design, you could theoretically get away with a clock rate equal to the rate of incoming data, or even less, if you can process more than one unit of data per clock and have a separate, higher-clocked piece capturing the input.

    19. Re:OOOoooo by MrNaz · · Score: 2, Insightful

      NOTE:

      The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive

      Not the cost of the units, but the cost of doing anything useful with them. For a person NOT integrating the parts into mass-produced items, it's only suitable for people doing something simple as a hobby, or for learning. I would *guess* that building anything to solve a problem in practice would take an incredibly large amount of time and skill, both of which are valuable resources even if they are your own. Cost of parts is only the total cost if you consider your time to be worthless. Making a DSP output a nice spectrograph of the airwaves wandering past your house is fine, making one that can perform underwater imaging is a different kettle of fish. Building something that can do that and then writing the code for it would not be a one man job, and it would not be cheap.

      Lunch money for public high school over 10 years: $10,000

      College education: $100,000

      Ability to read: Priceless.

      --
      I hate printers.
    20. Re:OOOoooo by ESRB · · Score: 1

      Teraflops . Singular is still tflops .

    21. Re:OOOoooo by julesh · · Score: 1

      that's because they don't need to clock that high....

      That depends on your application. If we're talking about something that you would consider using a GPU for, then what you're talking about is something that uses a very fast memory interface (typically ~6 Gbyte/s throughput), does minimal processing (c. 8-16 floating point operations) on each word read and writes back to memory. That's what GPUs are designed for, so that's what the FPGA board would have to be competitive with to be better than a GPU.

      Now, with a 64-bit memory interface (i.e., the widest memory interface most FPGAs will support, because you need to have some kind of phase shifter for each byte of the DDR interface, and most FPGAs only have 8 such devices), you can compete with that on an FPGA clocked at above 400MHz. If you know what you're doing, and have good enough memory, and you have a relatively modern and expensive FPGA. 500MHz would make it more comfortable.

      But, the post I was responding to suggested that FPGA boards were a better & cheaper option for this kind of calculation than GPUs. So I ask again, where can I find an affordable FPGA development board that is capable of implementing this kind of specification?

    22. Re:OOOoooo by julesh · · Score: 1

      The other response said it all, but here's another way of looking at it:

      For a processor, the minimum clock speed required is

      (rate of incoming data) * (# of instructions to process a unit of data) / (average number of instructions per clock cycle, aka IPC)

      For a nicely pipelined hardware design, you could theoretically get away with a clock rate equal to the rate of incoming data, or even less, if you can process more than one unit of data per clock and have a separate, higher-clocked piece capturing the input.


      Yes, but here's another way of looking at it. The post I was replying to suggested that the FPGA would be competitive with a GPU for some unspecified application. We must assume that it is an application that the GPU would be naturally good at -- that is, a serious number crunching application, because that's the only thing you'd usually consider using a GPU on.

      For this kind of application, the limit is almost always incoming data rate. That's why I asked for a 500MHz FPGA -- I did so because 500MHz is approximately the clock rate where an FPGA (which would typically have a 64-bit wide memory interface) could theoretically approach the I/O bandwidth of a GPU (which typically has a 128-bit wide memory interface running between 250 and 300MHz).

      Yes, I'm well aware of the architectural benefits of an FPGA over a processor, but note that for most real applications, the pipeline capacity of a GPU provides more than enough data processing and the memory interface will be the bottleneck.

    23. Re:OOOoooo by Saffaya · · Score: 1

      >PC's don't typically come with DSPs.

      Eh .. Maybe you could have a look at what the wonderful machines of the past had.

      Both the NeXTstation (NeXT) and the ATARI Falcon 030 had a Motorola 56001 DSP, in addition to their main CPU.
      Such an inclusion boosted the range and availabity of new kind of software on each.

    24. Re:OOOoooo by CODiNE · · Score: 1

      There is an interesting set of tasks in the signal processing space; forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging) requires lots and lots of signal processing.

      The lengths some people are willing to go to prove the Loch Ness monster is real!

      --
      Cwm, fjord-bank glyphs vext quiz
    25. Re:OOOoooo by End+Program · · Score: 1

      My comment was directed to his notion that DPS are slow and expensive, both assumptions are wrong. This was the case in the late 80's and early 90's.

      I would *guess* that building anything to solve a problem in practice would take an incredibly large amount of time and skill, both of which are valuable resources even if they are your own.

      I never said your time was not valuable. I am stating that just because a group of consumers assume you can't buy something in a retail store, it is not readily available and cheap. This assumption is wrong as well.

      People who work in the electronics field are familiar with what it takes to design and build these types of prototypes, and many of them are hobbyists.

      In addition, suppliers like Analog Devices basically give away development boards ($100 - 200) and software to easily implement FFTs and digital filters in their devices.

      So yes Virginia, DSPs do exist and there are easy to use for people who know what the 'F' they are talking about.

  11. Intel? by PFI_Optix · · Score: 0

    Shouldn't we be talking about nVidia, since this is a GPU?

    --
    120 characters for a sig? That's bloody useless.
    1. Re:Intel? by TheDreadSlashdotterD · · Score: 1

      Say what you want about intel graphics chipsets, but they make those too.

      --
      I have nothing to say.
    2. Re:Intel? by PFI_Optix · · Score: 1

      Can I buy two Intel GPU-based cards and team them in an attempt to match AMD's (really, ATI's) performance?

      Can I buy a motherboard with this Tflop technology integrated?

      Apples and oranges. I suspect fanboyism.

      --
      120 characters for a sig? That's bloody useless.
    3. Re:Intel? by TheDreadSlashdotterD · · Score: 1

      Funny, I have a laptop with an AMD Turion CPU. Unless Intel and Nvidia actually team up or Intel buys Nvidia, then there's nothing to really show in the Intel camp. AMD has an advantage here, but only so long as Intel keeps making joke graphics chipsets.

      So, fanboyism? I don't think so. I just don't care to rtfa

      --
      I have nothing to say.
    4. Re:Intel? by PFI_Optix · · Score: 1

      AMD bought ATI. This new chip is almost entirely the work of ATI engineers prior to purchase (there hasn't been enough time for AMD to get credit for it). Intel could fairly easily take on nvidia if they wanted to; they've never expressed interest.

      Intel is likely only in the graphics business so they can offer OEMs a complete package: motherboard with integrated everything, all made by Intel.

      --
      120 characters for a sig? That's bloody useless.
  12. The first rule of teraflop club... by Duncan3 · · Score: 4, Insightful

    Don't mention the wattage...

    And the second rule of teraflop club...

    Don't mention the wattage...

    Back here in the real world where we PAY FOR ELECTRICITY, we're waiting for some nice FLOPS/Watt, keep trying guys.

    And they announced this some time ago didn't they?

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:The first rule of teraflop club... by dlapine · · Score: 5, Informative
      LOL- you're complaining about wattage for 1 TF when they did it on a pair of friggin' video cards?? That's gotta be what, 500 watts total for whole PC?


      We've run several PC clusters and IBM mainframes that didn't have a 1TF of capacity. You don't want know much power went into them. Yes, our modern blade-based clusters are more condensed, but they're still power hogs for dual and quad core systems.

      Blue gene is considered to be a power efficient cluster and the fastest, but it still draws 7kw per rack of 1024 cpus. At 4.71 TF per rack, even Blue Gene pulls 1.5kw per teraflop.

      Yes, it's a pair of video cards, and not a general purpose cpu, but your average user doesn't have ability to program and use a Blue Gene style solution either. They just might get some real use out of this with a game Physics Engine that taps into this computing power.

      This is cool.

      --
      The Internet has no garbage collection
    2. Re:The first rule of teraflop club... by Duncan3 · · Score: 2, Informative

      Count real, usable FLOPS. GPU's don't win.

      But for ~$500, it's what's going to be used.

      --
      - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    3. Re:The first rule of teraflop club... by julesh · · Score: 1

      About 230W per card.

  13. It isn't that they are hard to use for more... by Assmasher · · Score: 3, Informative

    ...generic purposes, it is that they're (GPUs) suited better for certain types of operations. Image processing, as an example, is very well suited to working on a GPU because the GPU excels at addressing and operating on elements of arrays (textures basically.) I've used it as a proof of concept at work for processing large numbers of video feeds simultaneously for things like photometric normalization, image stabilization, et cetera, and the things are awesome. They work well in this scenario because the problem I'm trying to solve fits the caveats of using the GPU well. Slow upload of data, miraculously fast action upon that data, slow download of the data. Now, slow is relative and getting more and more relative as new chipsets are released.

    The actual framework for doing this is relatively simple although it certainly did help that I've a background in OpenGL and DirectXGraphics (so I've done shader work before); however, again, progress is removing those caveats as well. Generic GPU programming toolsets are imminent the only problem being ATI has no interest in their toolsets working with nVidia and nVidia has even less interest in their toolset(s) running ATI hardware. Something we'll just have to learn to deal with.

    BTW, DirectX10 will make this a little easier as well with changes to how you have to pipeline data in order to operate on it in a particular fashion.

    --
    Loading...
  14. Notpick by 91degrees · · Score: 4, Informative

    That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.

    1. Re:Notpick by Anonymous Coward · · Score: 0

      er, nitpick?

    2. Re:Notpick by 91degrees · · Score: 4, Funny

      Yup. It's the law. Any post pointing out an error must have at least one error itself.

    3. Re:Notpick by Anonymous Coward · · Score: 0

      Just like kilobyte, megabyte, terabyte is always plural, right? Or like kilogram, for that matter, never comes without an s? Because FLOP itself is singular, of course.

      I think you're misguided. 1 teraflop is singular.

    4. Re:Notpick by Anonymous Coward · · Score: 0

      I've worked in Supercomputing for almost 20 years and have yet to hear or see someone put an "s" at the end of anything when 1 is the unit. It's alway 1 Gigaflop, 1 Petabyte, 1 Megabit, etc, etc.

      Look at http://www.top500.org/ and you'll see they don't put an "s" when referring to a single Teraflop.

    5. Re:Notpick by Anonymous Coward · · Score: 0

      Jesus, you're such a notpick!

    6. Re:Notpick by bohemian72 · · Score: 1
      Floating-point operations per . . .

      Yeah I can see why that 's' isn't needed now.

      --
      The greatest thing you'll ever learn is just to love and be loved in return.
    7. Re:Notpick by autophile · · Score: 1

      That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.

      I don't think so. You can either use 1 teraFLOPS, 2 teraFLOPS, 3 teraFLOPS (in the same way you say 1 MHz, 2 MHz, 3 MHz), where I am not using capitals for emphasis but as the way the letters should be written, or you can use 1 teraflop, 2 teraflops, 3 teraflops (in the same way you say 1 snafu, 2 snafus, 3 snafus). The thing is that "FLOPS" is an acronym (i.e. an abbreviation formed from initials pronounced as word), while "flop" is a word (a neologism, really) whose definition is an acronym that happens to use the same letters.

      --Rob

      --
      Towards the Singularity.
  15. Not sonar? by dunc78 · · Score: 1

    So how is an image being formed under water using sound without using sonar? Also, I bet we could do the same thing above ground and maybe above the water we could try to image using radio waves. Since it is using radio waves, lets call it a radar.

    1. Re:Not sonar? by fyngyrz · · Score: 3, Insightful

      You use ambient sound instead of radiating a signal yourself, and you try to resolve the entire environment, rather than just the sound emitting elements in the environment. This makes you a lot harder to detect; it also makes resolving what is going on a lot more difficult. Hence the need for lots of CPU power. In the water or in the air. Passive sonar - at least typically - is intended to resolve (for instance) a ship or a weapon that is emitting noise. But the sea is emitting noise all the time - waves, fish burping, whale calls, shrimp clicking - all kinds of noise, really. Using that noise as the detecting signal is the trick, and it isn't very similar to normal sonar, in terms of what kind of computations or results are required. Classic sonar gives you a range and bearing; this kind of thing is aimed at giving you an actual picture of the environment. It's a lot harder to do, but man, is it cool.

      --
      I've fallen off your lawn, and I can't get up.
    2. Re:Not sonar? by rthille · · Score: 1

      All the noises of the sea is one of the reasons I prefer freediving to scuba.

      Lots more peaceful without the noise of sucking on a tank.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    3. Re:Not sonar? by dunc78 · · Score: 1

      My main point was that what you are referring to would still be considered sonar. It is a system that uses echos bouncing off of objects to detect them. The rest of what you discuss just goes to the resolution of the system. Imaging is nothing more than detecting at a fine resolution.

    4. Re:Not sonar? by ceoyoyo · · Score: 1

      Why not just cease inhaling while scuba diving if you want to hear?

    5. Re:Not sonar? by rthille · · Score: 1

      you can do that, but in general, you spend a lot of time breathing, vs not breathing even if you're holding your breath occasionally. It's just a lot less peaceful...

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    6. Re:Not sonar? by ceoyoyo · · Score: 1

      You spend a lot less time popping up to the surface though.

    7. Re:Not sonar? by 4D6963 · · Score: 1

      Interesting, but how do you theorically do all that? Using hydro/microphone arrays? And what kind of processing does it involve? Cross-correlations? I'd be interested with technical details (I've been programming DSP programs for a couple of years now)

      --
      You just got troll'd!
    8. Re:Not sonar? by rthille · · Score: 1


      Sure, and you can go a lot deeper too...but I still find the lack of gear and quiet of freediving much more relaxing. Not that scuba sometimes isn't fun.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    9. Re:Not sonar? by fyngyrz · · Score: 1

      First you model the transmissivity of the medium. As best you can; the better you do it, the better things work. You locate the original sound emission, again as best you can, and the better you do it, the better things work. Then you look at the delta-times and delta-vectors of the echos and place surfaces where those imply them. You do this with every sound you can, and the more echos coincide, the harder, or at least, more refelctive, a surface you've found. So layers of significantly different temperature water will be detectable (and also filterable, because geometrically speaking, they're pretty regular.) Ships, the surface, the bottom (and the bottom texture - hard or soft), fish, algae rafts, subs, mines... everything pretty much. Plus, the more stuff you find, the better your model of reflections gets, so it's a cumulative 3D picture generator. You need tons of horsepower at several stages; acquisition, modeling, database, object classification and tagging, graphics output.

      --
      I've fallen off your lawn, and I can't get up.
  16. Worthless Preview by jandrese · · Score: 2, Insightful

    So the preview could be boiled down to: Card still in development will be faster than cards currently available for sale.

    It also included some pictures of the cooling solution that will completely dominate the card. Not that a picture of a microchip with "R600" written on it would be a lot better I guess. Although the pictures are fuzzy and hard to see, it looks like it might require two separate molex connections just like the 8800s.

    --

    I read the internet for the articles.
    1. Re:Worthless Preview by LehiNephi · · Score: 1

      Keep in mind that is still a prototype, and from what I've heard, the cooling apparatus in the pictures is for OEMs like HP, Gateway, etc. If you consider that once it's released to retail, the fan will move on-board, and the total card won't be all that remarkably large.

      --
      Help find a cure for cancer. Join the [H]orde
    2. Re:Worthless Preview by BlackSnake112 · · Score: 1

      if one has a big case (front to back) the card pictured is not an issue. One machine I have now has a 7800GTX with the extender on it so it can slide into an extra bracket. It helps keep the card aligned. There are cases out there that can handle these long cards. But changing a case can be a pain.

      If you open up a link in the article there are a picture that shows a power connector. The link is here:
      http://content.zdnet.com/2346-10741_22-57089-2.htm l
      I know zdnet there goes any karma I had....

      OK it shows two connectors. One six way plug and one (it looks like anyway) eight way plug. I haven't seen any power connector that has an 8-way plug yet. Granted I am no expert, but the 1000 watt power supplies I have seen do not have this connector yet.

  17. Aren't G5 PowerPC Macs rated at 1 TF already? by david.emery · · Score: 1

    I thought the dual CPU G5 machines were rated at 1 teraflop. Certainly PowerPC AltiVec processors are super floating-point engines (but I don't know exactly how they rank at flops/mhz....)

    But then maybe the issue depends on the notion of what is "ubiquitous" and Macs don't qualify. I dunno, but I'm sure someone on /. will correct me :-)

            dave

    1. Re:Aren't G5 PowerPC Macs rated at 1 TF already? by bnenning · · Score: 1

      I thought the dual CPU G5 machines were rated at 1 teraflop.

      IIRC the best case for Altivec is 8 flops/cycle (fused multiply/add of 4 32-bit floats), so a quad G5 at 2.5GHz would have a maximum of 80 GFlops. With perfectly scheduled code you could get some additional ops out of the integer and FP units, but not close to a teraflop.

      --
      How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
  18. HTX by Joe+The+Dragon · · Score: 1

    How long before they put in the on the HT bus using a HTX slot?

    1. Re:HTX by *weasel · · Score: 1

      Or more appropriately:
      How long until AMD starts releasing multi-core chips with multiple/mixed CPU/GPU cores, joined by an virtual inter-core HT bus, and all wired into main memory? (and optionally a bank of GDDR)

      --
      // "Can't clowns and pirates just -try- to get along?"
  19. It's My Flop In a Box! by Anonymous Coward · · Score: 0

    Make her open the box...

  20. I could use it to program my automatic toaster by BrentRJones · · Score: 2, Funny

    which is fully connected to the Internet so that I can put my toast down or pop it up remotely.

    Wait...from some of the other comments about electricity usage, I might be able to do away with the heating coils and use the circuits themself to toast. That would really be an environment plus. Wonder how it would affect the taste of the bread?

    --
    Help end the use of Sigs. Tomorrow
    1. Re:I could use it to program my automatic toaster by david.emery · · Score: 1

      So would the heat sinks leave 'scorch marks'? Would this lead to a redesign of heatsinks to provide branding/corporate logos on toast?

      It might be kinda cool to get "Intel Inside" burnt onto a panini sandwich... :-)

              dave

    2. Re:I could use it to program my automatic toaster by carlmenezes · · Score: 1

      Well, given your bread will be so "close to the metal", I'm guessing, not good ;)

      --
      Find a job you like and you will never work a day in your life.
  21. General Purpose Programmers by Doc+Ruby · · Score: 3, Informative

    it's hard to program such GPUs for anything other than graphics applications.


    "Anything other" is "general purpose", which they cover at GPGPU.org. But the general community of global developers hasn't gotten hooked on the cheap performance yet. Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.
    --

    --
    make install -not war

    1. Re:General Purpose Programmers by Anonymous Coward · · Score: 1, Interesting

      Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.

      I'm still waiting for realtime raytracing GPUs.

    2. Re:General Purpose Programmers by RegularFry · · Score: 1

      How about real-time radiosity?

      http://www.geomerics.com/

      --
      Reality is the ultimate Rorschach.
    3. Re:General Purpose Programmers by NeMon'ess · · Score: 1

      MP3 is trivial. No more than 5 or 10 minutes to do an entire album. Or maybe 3 minutes. Video is where it's at. Turning home movies into h.264 video takes a ton of computing power and time. Get a GPU assisting a CPU encoding an hour of DV into h.264 in only fifteen or thirty minutes and the video scene will be all over it.

    4. Re:General Purpose Programmers by Doc+Ruby · · Score: 1

      MP3 encoding at a server isn't trivial load for a thousand simul streams on a P4. Your 3 minutes per 45min album is only 15x for about $1000, while a 1TFLOPS GPU card might encode 16 thousand times for $300.

      There are many more people coding multistream MP3 servers, but still no port to GPGPU.

      Video servers follow the same logic. But video decoders at the client will get better economics from many thousands/millions of ASICs in the mass market, rather than the few thousand servers a year that the market will carry. Unless we're talking about GPUs in 2008-9 which decode HD to "entertainment PCs".

      --

      --
      make install -not war

    5. Re:General Purpose Programmers by NeMon'ess · · Score: 1

      Could you clarify what situation needs streaming mp3 recompressed for multiple bitrates on the fly? Wouldn't it make more sense to do that ahead of time? Or are you talking about the music channels over digital cable and satellite? Do those channels get compressed on the fly like the rest of the video streams?

    6. Re:General Purpose Programmers by Doc+Ruby · · Score: 1

      Live streams, new streams, and on-demand streams in multiple bitrates/formats (MP3, AAC, etc). Streams of phonecalls (mostly conference calls).

      --

      --
      make install -not war

  22. lol what? by Anonymous Coward · · Score: 0


       

  23. Swiss army spatula by Joebert · · Score: 0

    Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

    In other news, Martha Stuart explains who screwdrivers don't make good hammers.
    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    1. Re:Swiss army spatula by Joebert · · Score: 1

      explains who screwdrivers

      I'm soo ready for the weekend I'm starting at the end of words, then jumping back to the beginning to type the rest of them.
      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
  24. This Just In by Waffle+Iron · · Score: 1

    Specialized hardware units rack up impressive benchmark numbers on specific tasks relative to general-purpose CPUs. News at 11.

  25. Also by Sycraft-fu · · Score: 2, Interesting

    There's a real difference between getting something to happen on a quasi-DSP like a GPU and on a real, general purpose processor like a CPU. If GPUs were full out CPU replacements, well then we wouldn't have CPUs any more, would we? The problem is that they are very very fast, but only at some things. Now that's fine, because that's what they were designed for. They are made to push pixels really fast and if they can do anything else, well bonus. However it does mean that they aren't a general purpose computing replacement.

    Also, the more specialized you get your DSP, the easier it is to get speed out of it. I'm sure it wouldn't be hard to design (I'm sure they already exist) a very narrow purpose DSP that does over 1 trillion floating point ops per second. However that's real different than having a CPU that will do the same, and do it across many kinds of ops.

    So as nifty as shit like this might be, it is real disingenuous to pretend that they've "beat" Intel. Intel isn't talking about a graphics card, they are talking about their CPUs. By the numbers my GPU has always been faster than my CPU, as well it should. There'd be no point in paying for specialized hardware if I had general purpose hardware that was faster.

    1. Re:Also by Anonymous Coward · · Score: 0

      "I'm sure it wouldn't be hard to design (I'm sure they already exist) a very narrow purpose DSP that does over 1 trillion floating point ops per second."

      A trillion? That's nothing! My super-specialised processor can do infinity calculations. Per femtosecond. They all have to be 0 + 0 though.

  26. No, Ars didn't say why. Here's why. by Animats · · Score: 4, Informative

    Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

    No, Ars has an article blithering that it's hard to program such GPUs for anything other than graphics applications. It doesn't say anything constructive about why.

    Here's an reasonably readable tutorial on doing number-crunching in a GPU. The basic concepts are that "Arrays = textures", "Kernels = shaders", and "Computing = drawing". Yes, you do number-crunching by building "textures" and running shaders on them. If your problem can be expressed as parallel multiply-accumulate operations, which covers much classic supercomputer work, there's a good chance it can be done fast on a GPU. There's a broad class of problems that work well on a GPU, but they're generally limited to problems where the outputs from a step have little or no dependency on each other, allowing full parallelism of the computations of a single step. If your problem doesn't map well to that model, don't expect much.

    1. Re:No, Ars didn't say why. Here's why. by Hannibal_Ars · · Score: 1

      Maybe you should do more than just skim the article and post an ill-informed flame. In the article, I blame the problems specifically on the complexity of dealing with programmer-managed memory hierarchy, and I give some of my reasoning.

      As for your specific comments about the classes of problems that do or don't map well onto a GPU, I've covered those issues in previous posts on the topic. The post you're trying to criticize wasn't about the kinds of problems that you can and can't solve efficiently with GPUs--it was about the vendor-supplied tools for programming them.

      --
      Senior CPU Editor | Ars Technica | http://arstechnica.com/
    2. Re:No, Ars didn't say why. Here's why. by Chris+Ashton+84 · · Score: 3, Informative

      Yes, you used to have to do everything in a graphical environment, but not any more. With nVidia's CUDA you program in C/C++, have a general memory model (you can access texture memory if it's efficient for what you need, but you also have general device memory and several other types of memory to choose from) and run on fully capable stream processors. As far as the programmer is concerned, the gpu is just a stream processor add-in card. You do have to manually transfer to and from device memory, but once you have your data on the gpu you're free to access it however you want (arrays, textures, linear memory, whatever). It's not a difficult system to understand, though tuning your program for performance will be challenging. Check out http://developer.nvidia.com/object/cuda.html for more info.

    3. Re:No, Ars didn't say why. Here's why. by Jherek+Carnelian · · Score: 1

      Mod this guy up - Cuda and CTM are Nvidia's and ATI's API's for direct access to the GPU, completely bypassing all the DirectX/OpenGL layers that were previously the only way to shoe-horn in computational workloads. The GP is obsolete and does not deserve a 5 at all.

  27. Step 2 by Saikik · · Score: 5, Funny

    Step 2: Don't leave your box in Boston.

  28. Texas Hold'em Mega-Mega Tourney by Anonymous Coward · · Score: 0

    Imagine turning the flop on a million million hands of Texas Hold'em.

  29. Chip in a Box by natrius · · Score: 5, Funny

    To all the fellas out there with geek friends to impress
    It's easy to do, just follow these steps:
    One: Cut a hole in a box
    Two: Stick your chip in that box
    Three: Make her open the box
    And that's the way you do it
    It's my chip in a box

  30. What about using it for Graphics? by LWATCDR · · Score: 1

    Could this be the start of some really good opensource drivers for ATI cards?
    Just how much of X and OpenGL could they offload on this card?
    What Theora, Ogg, Speex, or Divx encoding and decoding?
    I know it is a radical idea but since they are optimized for graphics and graphics like operations why not use them for that?

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    1. Re:What about using it for Graphics? by Shadow99_1 · · Score: 1

      ATI has been offloading codec work (for at least certain codecs) to the graphics card since the 9XXX series. H.264, for instance, is offered through a codec interface that is 'accelerated' by the card on X19XX series cards... I'd assume it's not done with dedicated hardware, but by offloading the work of processing to the cards GPU.

      --
      we are all invisible unless we choose otherwise
    2. Re:What about using it for Graphics? by LWATCDR · · Score: 1

      But that is closed source. What about for FOSS?

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    3. Re:What about using it for Graphics? by Shadow99_1 · · Score: 1

      Neither Nvidia or ATI have open drivers currently, so I don't see your point... Whose graphics cards are you going to use? Intel's?

      --
      we are all invisible unless we choose otherwise
    4. Re:What about using it for Graphics? by LWATCDR · · Score: 1

      My point is I am hoping that this opening up of the GPU might lead to FOSS drivers for those cards.
      I use the Nvidia cards on my Linux machines

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  31. Three step process... by Sampy · · Score: 1

    1. Cut a hole in a box
    2. Put your chips in that box
    3. Make AMD open the box

    That's the way you do it
    It's a teraflop in a box!

  32. One... by Anonymous Coward · · Score: 0

    You cut a hole in the box.

  33. There is no such thing as a "Teraflop" by Lobais · · Score: 1

    It is 1 Teraflops since it stands for Tera floating point operations per second. The ending "s" is not plural.

  34. SuperCell by Doc+Ruby · · Score: 2, Informative

    The Playstation 3 is reported to harness 2 TFLOPS. But "only" 204GFLOPS run on the Cell CPU, 10%. The other 1.8TFLOPS runs on the nVidia G70 GPU. But the G70 runs shaders, very limited application to anything but actually rendering graphics.

    The Cell itself is notoriously hard to code for. If just some extra effort can target the nVidia, that's TWO TeraFLOPS in a $500 box. A huge leap past both AMD and Intel.

    --

    --
    make install -not war

    1. Re:SuperCell by Halo- · · Score: 1

      disclosure: I do not speak for my employer.

      The problem is that you can't just say, "I can multiply two floating points in time X, and therefore my speed is 1/X." You have to actually get that data to and from some sort of useful location. High performance computing is bounded by memory bandwidth these days, not clock speed. The article summary mentions streaming but I can find no reference to that in the the actual article itself.
      Consider digital SLR cameras, decent dSLR can take a picture in 1/1600 of a second, but it can't take 1600 pictures a second. This is because moving the data is much slower than acquiring the data.

    2. Re:SuperCell by Doc+Ruby · · Score: 2, Informative

      PCI-Express offers 64 "lanes" pumping up to 500MBps each (since January, 250MBps in actual shipping HW). In a switched hub, for 256Gbps total. The Cell's EIB is probably its most interesting feature: 200Gbps token ring that transparently connects offchip. So the new IBM Cells, with 4 cores (Power970 + 8-SPEs each) on one die (or SoC) has 32x 25.6GFLOPS + 4x 970 all moving at 200Gbps. Or just a single Cell at 204GFLOPS feeds 200Gbps to a PCIe stuffed with 20x 10Gig ethernet cards (10 double-10GigE PCIe cards).

      The single Cell therefore offers 32 (bits) * 8 (SPEs) = 256 FLOPs per (5 picosecond) loop on a full pipeline. The four-way offers 1KFLOPs. There are 1024-SPE Cells in the product line, which are 32KFLOPs; a 4-way would offer 128KFLOPs per loop. Even complex video codecs and mixing need at most 100MIPS, which such a beast would run at 5ns, or 200 million times realtime. 200Gbps is 5 thousand simul Blu-Ray video streams, so we're talking about the beast working at 40 thousand times its max video throughput, while a single Cell works at 8 times video throughput. Audio throughput is much lower FLOPS:bit.

      So the Cell data transfer is certainly ample to its high processing speed.

      --

      --
      make install -not war

    3. Re:SuperCell by Slashcrap · · Score: 1

      The Playstation 3 [wikipedia.org] is reported to harness 2 TFLOPS [wikipedia.org].

      That's why the games released for the PS3 are decades ahead of anything else in existence. I've heard of people seeing the PS3 running a game and literally dropping dead on the spot of future shock. It doesn't matter if it's only using 3% of its power, it's still a quantum leap relative to everything else in existence because of those 2 TFLOPS. Sorry, I seem to be talking bollocks.

      The other 1.8TFLOPS runs on the nVidia G70 GPU.

      Right. And two of AMDs unreleased, next generation cards can only manage 1 TFLOP between them. So realistically we have two possibilites :

      a) AMD are massively understating the potential performance of the R600 for marketing reasons.
      b) Sony are massively overstating the potential of the G70 in the PS3 for marketing reasons.

      So children, which do we think is more likely? And please only raise your hand if you're not too busy posting on Slashdot to use any logic or reason in your arguments.

    4. Re:SuperCell by Doc+Ruby · · Score: 1

      Or the nVidia chip really does deliver 80% greater performance (or less, the AMD claim is "more than 1 trillion floating-point calculations per second"). The AMD chip is a general purpose CPU, "using a general "multiply-add" (MADD)", while the nVidia chip is a GPU. GPUs don't compete directly with CPUs, as I said in the post to which you replied. In fact, the AMD 1TFLOPS appears to beat the Cell's 204GFLOPS by a mile. Which is a tremendous beating for a Sony chip that's incompatible with existing SW, either Intel or most PowerPC, and supposedly the quantum leap in PC performance this decade.

      So while you're demanding logic and reason for entry, why don't you examine your own hidden premises and excluded middles? There are 3 kinds of people in the world: those who think there are 2 kinds of people in the world, and those who don't.

      --

      --
      make install -not war

  35. Added caveat: by Ayanami+Rei · · Score: 1

    You don't need greater than 32-bit precision for any of the MAC ops. Usually that kind of limitation can be overcome by rethinking the algorithm, and doing some accumulation or error analysis outside of the GPU.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  36. Maybe if the docs where released by Anonymous Coward · · Score: 0

    The problem is they have multiple new plataforms and they do not release docs for the bare metal. Compare that behaviour to common architectures like x86, 68K, PPC, ARM, MIPS or the big bunch of DSPs and microcontrollers. You can get books or PDFs with all the instruction, memory ranges, timings... you get all you need to really program them, build compilers, or even design new systems around a chip, and by experience you know they do not change such things at will, some systems are decades old. Thus, investing time is better done in systems that have proven stable and are clearly well documented. They have to choose if they want to be a market standard or keep their "precious" IP. Intel must have lose all of it by now, publishing docs like how to use the SSE instructions, yeah.

  37. Thats cheap by majortom1981 · · Score: 1

    Thats not right AMD is cheating. AMD is also using the Video cards. AMD did not beat Intel. If they are going to go against intel with processing stuff they should do it via CPU's only. IF Intel did the same thing with 2 nvidia cards in sli I bet they could get the same results.

    1. Re:Thats cheap by Anonymous Coward · · Score: 0

      Well, as AMD owns ATI, they only used their own products to get that result, whereas Intel would have to use someone else's products.

    2. Re:Thats cheap by Anonymous Coward · · Score: 0

      Fanyboy! muhahahah!

  38. This reminded me of something... by Torsoboy · · Score: 0, Redundant

    1. Cut a hole in the box. 2. Put your flop in that box. 3. Make her open the box. And that's the way you do it!

  39. Well...duh by Anonymous Coward · · Score: 5, Insightful

    GPGPU is hard because we're still in the very early days of this particular revolution. As I think about it, and from what we know of AMD's plans in particular, I think this is kind of like the evolution of FPU.

    See, in the early days FPU was a seperate chip (anyone remember buying an 80387 to plug into their mobo?). Writing code to use FPU was also a complete pain in the ass, because you had to use assembly, with all the memory management and interrupt handling headaches inherent. FPUs from different vendors weren't guaranteed to have completely compatible instruction sets. Because it was such a pain in the ass, only highly special purpose applications made use of FPU code. (And, it's not that computer scientists hadn't thought up appropriate abstractions to make writing floating point easy. Compilers just weren't spitting out FPU code).

    Then, things began to improve. The FPU was brought on die, but as an optional component (think 486SX vs 486DX). Languages evolved to support FPUs, hiding all the difficulty under suitible abstractions so programmer could write code that just worked. More applications began to make use of floating point capabilities, but very few required a FPU to work.

    Finally, FPU was brought on die as a bog standard part of the CPU. At that point, FPU capabilities could be taken for granted and an explosion of applications requiring an FPU to achieve decent performance ensued (see, for istance, most games). And writing FPU code is now no longer any more difficult than declaring type float. The compiler handles all the tricky parts.

    I think GPGPU will follow a similar trajectory. Right now, we're in phase one. Use a GPU for general purpose computation is such an incredible pain that only the most specialized applications are going to use GPGPU capabilities. High level languages haven't really evolved to take advantage of these capabilities yet. And yes, it's not as though computer scientists don't have appropriate abstractions that would make coding for GPGPU vastly easier. Eventually, GPGPU will become an optional part of the CPU. Eventually high level languages (in addition to the C family, perhaps FORTRAN or Matlab or other languages used in scientific computing) will be extended to use GPGPU capabilities. Standards will emerge, or where hardware manufacturers fail to standardize, high level abstraction will sweep the details under the rug. When this happens, many more applications will begin to take advantage of GPGPU capabilities. Even further down the road, GPGPU capabilities will become bog standard, at which point will see an explosion of applications that need these capabilities for decent performance.

    Granted, the curve for GPGPU is steeper because this isn't just a matter of different instructions, but a change in memory management as well. But I think this kind of transition can and will eventually happen.

    1. Re:Well...duh by julesh · · Score: 1

      Writing code to use FPU was also a complete pain in the ass, because you had to use assembly, with all the memory management and interrupt handling headaches inherent. FPUs from different vendors weren't guaranteed to have completely compatible instruction sets. Because it was such a pain in the ass, only highly special purpose applications made use of FPU code.
      [...]
      Finally, FPU was brought on die as a bog standard part of the CPU. At that point, FPU capabilities could be taken for granted and an explosion of applications requiring an FPU to achieve decent performance ensued (see, for istance, most games). And writing FPU code is now no longer any more difficult than declaring type float. The compiler handles all the tricky parts.

      What compilers were you using that didn't do this during the 386/486 era? I mean, I was using MS QuickBASIC back in the 286 days, and *that* supported the FPU when it was available. So did TurboPASCAL 6, which I used to use as well.

  40. Re:DVI??? by sumdumass · · Score: 1

    ehh?? I posted this in this article's discusion.

    I have no idea how it ended up here. I didn't have this story open yet when posting this. Ohh well. Shit happens..lol

  41. Future plans by UnknowingFool · · Score: 4, Funny

    Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years."

    So I take it that AMD will be ready for Vista's successor?

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
    1. Re:Future plans by Shadyman · · Score: 1

      No, I'm pretty sure it means AMD will be ready for Vista.

  42. I put my junk in a box by Anonymous Coward · · Score: 0

    Cue the "Dick In A Box" jokes...

    1. Re:I put my junk in a box by JoelMartinez · · Score: 1

      1. Cut a hole in the box 2. But your t-flop in that box 3. Have her open the box

    2. Re:I put my junk in a box by Anonymous Coward · · Score: 0

      and apparently, AMD has penises that can perform 10^9 operations a second.

  43. ubiquitous by Elwood+P+Dowd · · Score: 1

    I do not think it means what you think it means.

    --

    There are no trails. There are no trees out here.
  44. Saying 1 Gigaflop is like saying 1 Gbp by Anonymous Coward · · Score: 0

    You lose at life.

    Google:
    1-gigaflop (957 results)
    1-gigaflops (11,200 results)

    If you say "1 gigaflop" you're as much a moron as someone saying "1 Gbp" instead of "1 Gbps". And yes, folks who write "lazer" instead of "laser" are morons too because acronyms are not supposed to be written phonetically, they're written based on what their letters stand for!

    1. Re:Saying 1 Gigaflop is like saying 1 Gbp by Xaositecte · · Score: 1

      That's LASER, you insensitive clod.

  45. OOOoooo-Hide n' Seek. by Anonymous Coward · · Score: 0

    "Simple: they aren't available. PC's don't typically come with DSPs. "

    Of course they don't.

    Right? Right? Right?

    1. Re:OOOoooo-Hide n' Seek. by dextromulous · · Score: 1

      Ha, that's good one... suggesting that a SB audigy can do the kind of DSP the GPP was requiring on a 30MHz signal... that made my day. :-)

      --
      There are two types of people in the world: those who divide people into two types and those who don't.
  46. More, or less price discrimination? by Anonymous Coward · · Score: 0

    There is one aspect of this that I am a bit worried about. In the graphics card market, you have artificial price discrimination based upon application. Namely, if all you want to do is play games you but a Ge-Force card, but if you want to do pro 3d work you buy a Quadro. The only difference between the two is the Quadro has different drivers that disable some features while enabling others. That, and, of course, a *very* steep price difference. If and when GPUs are used for even more applications, might we see even more price discrimination?

    It is argued that the video card companies are well within their rights to sell the same product at different prices because the drivers are different, but image if that happened to CPUs. What if the same piece of silicon was sold at different prices depending on if you were a professional writer or someone who just surfs the web? Imagine if Intel and AMD did not give you direct access to the hardware and instead put a an extra layer between the programmer and the chip so they could sell the same chip to different people for different prices. What would something like that do to open-source?

    In this merger of the CPU and GPU either the openness of the CPU will extend to the GPU, or the closed-ness of the GPU will extend to the CPU.

  47. Very hard to prgram for by Anonymous Coward · · Score: 0

    unless you use a decent compiler that actually works with almost standard C++:
    http://www.rapidmind.net/technology.php

  48. Re: Teraflop in 'a Box by mrbluze · · Score: 1

    And boy, was she disappointed after being told she'd be getting a hard disk!

    --
    Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
  49. 5 years? by Sebastopol · · Score: 0, Redundant

    Maybe Intel could just buy a graphics company that already has the technology demo something in 4 months. Like AMD did, again (recall their last their two biggest wins came from acquiring NexGen's intellectual property). AMD has a habit of making money by plagiarizing the work of other smaller companies they acquire. Whereas Intel apparently buys smaller companies and loses money :) :) (er, whatever they bough and sold to Marvell for a huge loss). --ducks for cover--

    --
    https://www.accountkiller.com/removal-requested
  50. Apples and oranges by Mathness · · Score: 1

    AMD/ATi have a GPU, Intel will make a CPU.

    --
    Carbon based humanoid in training.
  51. SuperCellulite by Anonymous Coward · · Score: 0

    "PCI-Express offers 64 "lanes" pumping up to 500MBps each (since January, 250MBps in actual shipping HW). In a switched hub, for 256Gbps total. The Cell's EIB is probably its most interesting feature: 200Gbps token ring that transparently connects offchip. So the new IBM Cells, with 4 cores (Power970 + 8-SPEs each) on one die (or SoC) has 32x 25.6GFLOPS + 4x 970 all moving at 200Gbps. Or just a single Cell at 204GFLOPS feeds 200Gbps to a PCIe stuffed with 20x 10Gig ethernet cards (10 double-10GigE PCIe cards)."

    Deja Vu. Welcome to the Transputer/Microway business model.

    1. Re:SuperCellulite by Doc+Ruby · · Score: 1

      It is similar to the Transputer architecture. But which millions of videogame consoles was Sony selling with a Transputer running it? That's the bizmodel. Which makes all the difference, validating the Transputer architecture that is now suitable to the media processing demands of the mass market.

      --

      --
      make install -not war

  52. I get a kick out of the tags people assign ... by pk69 · · Score: 2, Insightful

    I laugh every day at the tags people assign to articles, but today I laughed the hardest with the tag "dickinabox" ...

    --
    http://phlite.net Lay out on the beach in Rocky Point, Mexico : http://www.granizo.com
  53. Wake me up when by Wolfier · · Score: 1

    this teraFLOP are on 64-bit doubles.  Single precision teraFLOPs are close to useless for anything that requires a teraFLOP.

  54. Questions -- Re:Not sonar? by zooblethorpe · · Score: 1

    This sounds truly fascinating. How far along is anybody in building an actual working system along these lines? Or is this all still at the drawing board phase, awaiting the required horsepower to really take off? Where should I look to find out more (for a complete layman, at that)?

    Cheers,

    --
    "What in the name of Fats Waller is that?"
    "A four-foot prune."