Slashdot Mirror


BrookGPU: General Purpose Programming on GPUs

An anonymous reader writes " BrookGPU is a compiler and runtime system that provides an easy, C-like programming environment (read: No GPU programming experience needed) for today's GPUs. A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4. Combine this with the increased memory bandwidth, 25.3 GB/sec peak compared to the Pentium 4's 5.96 GB/sec peak, and you've got a seriously fast compute engine but programming them has been a real pain. BrookGPU adds simple data parallel language additions to C which allow programmers to specify certain parts of their code to run on the GPU. The compiler and runtime takes care of the rest. Here is the Project Page and Sourceforge page."

38 of 275 comments (clear)

  1. High Performance for General Purpose? by tempfile · · Score: 3, Interesting

    I suspect that this high performance is only attainable for the field the GPU is specialized for, i.e. graphics-related things. Or isn't it?

    1. Re:High Performance for General Purpose? by Anonymous Coward · · Score: 1, Interesting

      Sweet. Morphing windows that don't hog the CPU.

    2. Re:High Performance for General Purpose? by Total_Wimp · · Score: 3, Interesting

      I can't help but notice the similarity between shader operations and how neurons interact. These processors might be a good platform for some AI tasks.

      I especially like the idea that the GPU and CPU can work together on the task. If the GPU was handling neuron tasks and the CPU was handling other necessary tasks we could get a very big boost to desktop AI

      TW

    3. Re:High Performance for General Purpose? by BrainInAJar · · Score: 4, Interesting

      would the percision be enough though? as far as i know, GPU's do a lot of rounding off

    4. Re:High Performance for General Purpose? by Directrix1 · · Score: 3, Interesting

      Yes, anything computationally intensive that works over a range of data can usually find a parrallel solution. Such as image/video manipulation/encoding/decoding, encryption, and cracking (and hopefully this will give us a platform for better software RF). I've always wondered why this stuff didn't just become worked into a coprocessor. Because very little new stuff actually happened that was directly related with the video card (as in taking output from the machine and displaying it on a screen). I think the card manufacturers saw this, so they jumped on the 3d acceleration bandwagon toting it as a new video card feature, when it should've just been in the domain of a new math coprocessor.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
  2. Cool, but by MooCows · · Score: 3, Interesting

    What kind of instructions does the GPU actually accept?
    I mean, you probably just can't run any kind of algorithm on there can you?

    --
    The path I walk alone is endlessly long.
    30 minutes by bike, 15 by bus.
  3. Basically like having two processors... by Anonymous Coward · · Score: 4, Interesting

    I wonder how long till we see a (insert worthwhile cause here)-At-Home client that supports this?

    1. Re:Basically like having two processors... by Chordonblue · · Score: 2, Interesting

      Yeah, I remember that! Lucasfilm used it to animate a mothership in 'Rescue on Fractalus' (itself a marvel of tech for the Atari) while the game loaded. The were cool (de)compression routines that harnessed this as well.

      I also seem to recall certain music pieces that could play extra parts by blanking the screen. There was also a really cool 9 second sample of 'You really got me' - the Van Halen version - and it blanked the screen to play it.

      Wow! Them were the salad days!

      --
      "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
    2. Re:Basically like having two processors... by cybergibbons · · Score: 3, Interesting

      Ha! The C64 disk drive had it's own processor which you could use to run programs as long as you could deal with the painfully slow serial link. Beat that.

  4. Cool ... by torpor · · Score: 5, Interesting

    ... can you say 'software synthesists' wet dream?

    Oh, suddenly, that 'game investment' also gives you a few 100 extra voices of polyphony?

    Sweet ... $5 to the first person to use Brooke to make a synthesizer. :)

    --
    ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
    1. Re:Cool ... by torpor · · Score: 2, Interesting

      What does 'synth' mean to you?

      To me it doesn't just mean Virtual Analog, or subtractive... it can be anything that makes noise ... so yeah, filters, yeah, effects, yeah, a single monster filter...

      Its all good. Lets see what the GPU's can do ...

      --
      ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
  5. wait a minute by Janek+Kozicki · · Score: 5, Interesting

    A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4.

    wait, if there is a technology that allows construction of GPU that is 3 times faster than the fastest CPUs, why Intel and AMD do not use this technology to build those 3times faster CPUs?

    are you sure that you can compare the speed of GPU and CPU?

    --
    #
    #\ @ ? Colonize Mars
    #
    1. Re:wait a minute by Jah-Wren+Ryel · · Score: 2, Interesting

      All the world is not a FLOP. GPU = Graphics Processing Unit, not General Purpose Unit.

      --
      When information is power, privacy is freedom.
    2. Re:wait a minute by mdpye · · Score: 4, Interesting
      And on top of all of that I can buy 3 2.4Ghz P4s for the price of a Geforce FX5950

      But you forget the 256MB (at least) RAM on a steaming fast interface that you get with the GeForce... It makes the P4s' cache look pretty paltry in size by comparison.

      MP
    3. Re:wait a minute by barik · · Score: 5, Interesting

      Are you sure that you can compare the speed of GPU and CPU?

      Professor Pat Hanrahan, of Stanford University, made a stab at answering this question in his presentation 'Why is Graphics Hardware so Fast?'. The first half of the presentation focuses on this question, while the second half of the presentation covers programming languages that utilitize this hardware. Specifically, the Stanford Real-Time Shading Language (RTSL) and Brook are discussed. Overall, it's a good presentation that should get you up to speed with the basics of what's happening in this area of research.

  6. How does this look? by adrianbaugh · · Score: 5, Interesting

    I'm completely new to meddling with graphics card, so apologies if this is a silly question: when programs utilising the GPU for arbitrary calculations are running does the screen go weird, or is there a way of stopping the output being displayed? A screenfull of junk might not matter to a scientist leaving their computer to crunch numbers for a few months but it wouldn't be good for a general-purpose program.

    --
    "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
    - JRR Tolkien.
  7. I am not an EE, but... by unfortunateson · · Score: 5, Interesting

    It would seem to me that the GPU is not going to be as general-purpose as the CPU, but could still attain the high mathematical throughput with vector-oriented processing.

    Doing string searches, complex logic analyses, etc. would probably suck, but big data manipulations, such as SETI-style wave transformations, molecular analysis, etc., might be able to take advantage of them.

    --
    Design for Use, not Construction!
  8. Fast Fourier Transform by HalfFlat · · Score: 3, Interesting

    I'd love to see an FFT implementation (maybe it's not so hard ... will have to download and play with it.)

    A lot of scientific code is constrained by how fast you can do an FFT, perhaps of arbitrary size. And a fast graphics card is a lot cheaper than a high-end processor.

    For embarassingly parallel vector problems, this is just the sort of thing for cheap, powerful clusters based around a cheap PC and a fast GPU.

    1. Re:Fast Fourier Transform by Kazymyr · · Score: 4, Interesting

      Not to mention that you can put several PCI video cards in the same cheap PC. Multiply power by N.

      --
      I hadn't known there were so many idiots in the world until I started using the Internet -Stanislaw Lem
  9. Drawing text with GPU shader units? by jonsmirl · · Score: 4, Interesting
    Has anyone tried drawing text with GPU shader units? It would work something like this:

    1) Each character would have it's own shader program.
    2) You would set the shader program, draw a rectange, and the character would appear.
    3) The shader programs would be automatically generated by processing TrueType files.

    To implement:
    1) Break Truetype outline up into a number of convex curve segments.
    2) Each of these curve segments would be represented as a set of constants in the shader program
    3) For each pixel, test a line from pixel to an edge.
    4) If the number of segments crossed is odd the pixel is black else white.
    The algorithm can be refined to add antialiasing and hinting.

    What you end up with is text that is clear at any resolution. The size of the text is controlled by the rectangle you draw it in. The text can also be clearly rotated and sheared.

    An obvious optimization is to get the GPU vendors to add a shader instruction to do the calculation for which side of the bezier curve segment the current point lies.

    While not important for games drawing text is critical for desktops. And we all know about the current trends to draw desktops with 3D hardware.

    1. Re:Drawing text with GPU shader units? by jonsmirl · · Score: 2, Interesting
      Think about a compositing system where the window the app is being drawn into has been transformed into a non-rectangular shape by the compositing engine.

      The app thinks it is drawing into a flat rectangle. But the compositing engine distorts the font bitmap with it's transform. With the shader approach the distortion doesn't happen. Same problem happens when the compositing engine does scaling.

      You only need one shader program per glyph not matter what point size you want to draw. There is a lot of overhead in managing the bitmaps for all of the different point sizes. These bitmaps can get quite big on a 4K by 3K resolution screen.

  10. Excellent! by macemoneta · · Score: 2, Interesting

    I had submitted an AskSlashdot on this subject:

    2003-04-20 01:51:36 Using video processing as "attached processor" (askslashdot,hardware) (rejected)

    But as you can see it was rejected. I was particularly interested in the use of the GPU for cryptographic functions (e.g., with a loopback encrypted filesystem), to offload the processing from the main CPU. Is anyone aware of any work in this area?

    Is this even a viable implementation, or would the overhead of continually dispatching work to the GPU exceed the benefit derived?

    --

    Can You Say Linux? I Knew That You Could.

  11. HP for GP?-AGP Bottleneck. by Anonymous Coward · · Score: 2, Interesting

    Wasn't there a Slashdot story about the slowness of reading back across the AGP bus? How will that affect the usefullness of GPUs?

  12. I've always wondered when this would happen... by malakai · · Score: 2, Interesting

    But what I'm really looking forward to is a Physics specific processor that sits alongside the graphics processor, and is resposible for collisions detection.

    The last few SIGGRAPHS had numerous approaches using GPU's to detect collisions, in real-time, betwen complex volumes using only the GPU. With some minor tweaking, graphics manufacturers can make this 100x more efficent and easier to implement.

    With the 'shader' languages being able to create and modify meshesh now, procedurally, this is the best place to detect collisions (beaking back the mesh data to your motherboard so that your local CPU can figure out what collided, is not efficent).

    1. Re:I've always wondered when this would happen... by Animats · · Score: 3, Interesting
      But what I'm really looking forward to is a Physics specific processor that sits alongside the graphics processor, and is resposible for collisions detection.

      It's been done. The Havok game physics system is available for the Playstation 2, and the physics is running in the vector processors, where most of the PS2's compute power resides.

      Collision detection isn't that CPU-intensive. (This may surprise people not familiar with the field. But it's true. If collision detection is using substantial CPU time, you're doing it wrong.) Correct collision resolution is where the time goes.

      Physics code works better with double-precision FPUs. You need both dynamic range and long mantissas to do it well. Some of the game consoles, and most of the GPUs, only have single-precision FPUs. It's possible to make physics code work in single precision, but fast-moving objects that cover considerable distance may have problems.

  13. DSPs = linear equation processors by Doc+Ruby · · Score: 2, Interesting

    We used the AT&T DSP32, a 12.5MFLOPS DSP, 15 years ago at Array Technologies. Programmable in a native C source code, with multiply-accumulate (MAC) instructions optimized in microcode, the DSP32 was lightning fast at y = mx + b equations in its arithmatic logic unit (ALU), and its control logic unit (CLU) was also very fast at branching, including no-overhead looping. Linux runs on one of its many fascinating descendants, the Xilinx Virtex-2 Pro.

    --

    --
    make install -not war

  14. Re:The future is the past by Total_Wimp · · Score: 4, Interesting

    PCI-X can fix this data bus in other ways as well. Motherboards come with one AGP slot, but PCI-X can and will provide many expansion slots.

    Picture five high end GPUs on the motherboard eclipsing the single high-end cpu for a fraction of the price. Intel and AMD would be forced to cut the asking price of their products to compete. We could finally see some real four-way competition for "processors".

    TW

  15. Re:Research by BiggerIsBetter · · Score: 4, Interesting

    I (and presumably others) have asked some project leaders about this, but it seems to come down to testing and support of various cards. Also, remember that this is relatively unknown technology - Amiga blitting aside ;-) - you have to be pretty sure it's going to give accurate and consistent results before using it seriously. Find-A-Drug was my project of interest, and they have a Linux version too.

    --
    Forget thrust, drag, lift and weight. Airplanes fly because of money.
  16. GPU use for scientific programming. by kiniry · · Score: 4, Interesting

    Researchers at Caltech and other institutions have been looking at this for about three years. See "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid" by Bolz, Farmer, Grinspun and Schroder (SIGGRAPH 2003), for example. The paper, illustrations, and movies are available from Dr. Grinspun's homepage. The primary problems with the approach at the time this work was done was the limited bandwidth of texture-related operations in OpenGL based upon improper assumptions in pipeline optimization.

    --
    Joseph R. Kiniry
    http://kind.ucd.ie/~kiniry/
    Lecturer
    UCD School of Computer Science and Informatics
    1. Re:GPU use for scientific programming. by echion · · Score: 2, Interesting

      The bandwidth limitations you highlight and the others mentioned in other papers by Grinspun are probably similar to quantum-computing limitations: e.g., in GPUs you can read some read-only registers, multiply/add them (in parallel) tons of times, and then write to some other write-only registers in the GPU; in quantum computing you can take some atoms whose state you knew, applying tons of (parallel) quantum operations, and then observing the results (so they're useless for more quantum computations).

  17. More speed for the Terascale cluster? by Anonymous Coward · · Score: 2, Interesting

    Weren't the Virginia Tech's G5 supercomputer nodes all equipped with standard ATI cards? If used right, there could be 1100 more processors to use...

  18. distributed.net by terminal.dk · · Score: 2, Interesting

    When will the new client be out for this platform ?

    I know my PC eats 20 Watts more of power when in 3D mode, but still, I want the faster agent :=)

  19. Crypto by Effugas · · Score: 2, Interesting

    We've talked a decent amount about doing crypto on GPU's. The fundamental issue is that such processors are massively optimized for operating on floating point numbers, and almost all crypto is integer based -- lots of bitshifts, MODs, and XOR's, only the latter of which this gear handles correctly. Even if the problem with getting data back off the card was solved, the card itself couldn't do the job.

    Indeed, I only know of one crypto hack that uses floats -- being from DJB, it's predictably brilliant. Basically, it's easy to compute the floating point error from a given operation, but computationally hard to find an operation that yields a given error. So you can effectively sign (or at least MAC) arbitrary content. Nice!

    --Dan

  20. Imagine a Beowulf Cluster... no, seriously by billstewart · · Score: 4, Interesting
    There's a cluster of Sony Playstations at UIUC (BBC) that's using the Emotion Engine to do numbercrunching and running Linux on the main processors to do communications and I/O. It's probably not strictly Beowulf, because it's using the Playstation version of Linux.

    This cluster has 70 Playstations (one article said that they'd ordered 100, but only 70 are in the cluster... Obviously the others are being used for "research".)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  21. AT&T DSP32 Cluster Supercomputer in late 80s by billstewart · · Score: 2, Interesting
    The AT&T DSP32 definitely rocked. In addition to doing 32-bit floating point multiply and accumulate, it could simultaneously do 24-bit integer calculations. The supercomputer cluster was up to 128 of them (I forget if they were 8 or 16 per board), with communications structured as a tree, which could give you 1 GFLOPS sustained and up to 2 GFLOPS if you could keep them busy doing multiply-and-accumulate. Not bad for a desktop in the late 80s, though of course you can get that for $49 today:-)

    A typical application was to use a couple of the processors to do geometry while the rest crunched shading, or alternatively to do lots of FFTs for signal processing - the box was mainly designed for the Navy, and 32-bit floating point was more than enough precision given the A/D converters on sonar input.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  22. Ray tracing with a GPU? by Angst+Badger · · Score: 2, Interesting

    So I have to wonder how much POVray could be sped up -- if any -- by modifying it so that suitable calculations were run on the GPU, in parallel, while the CPU took care of the rest.

    --
    Proud member of the Weirdo-American community.
  23. Re:AT&T DSP32 Cluster Supercomputer in late 80 by billstewart · · Score: 3, Interesting
    Yes, they were 25 MFLOPS. The chip had a 12.5 MHz cycle rate (I think that was also the clock speed), and each cycle could do a 32-bit multiply, a 32-bit add, and a 24-bit simple integer operation (some integer ops took multiple clocks, I think?)

    Your music application sounds like fun. I didn't know anybody was still doing anything quite like that by 1990 - there was a whole range of people around John Cage's time who did lots of prepared piano stuff.


    Some of the people who were trying to sell our multi-processor supercomputer flavor came up with a music studio application, doing lots of audio processing and mixing, sort of like your device turned inside out. Don't know if they sold more than one of them before the Lucent spinoff took them away.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  24. Re:Like the good old days by Anonymous Coward · · Score: 1, Interesting

    Actually, I made a image codec on the Amiga that programmed the blitter (Amiga graphics co-processor) to do delta decoding, using 3 bitplanes to describe -4 to +3 delta, and summing with previous pixel on a 5 pitplane image. Decoding was done in parallel with the main processor decoding a runlength+huffman stage for next frame. I think that was the first codec I ever made, and certainly the one I had most fun making. Ah, those were the days..

    For the interested, it was used on PMC's Alpha & Omega released on The Gathering 1991.