BrookGPU: General Purpose Programming on GPUs
An anonymous reader writes "
BrookGPU is a compiler and runtime system that provides an easy, C-like programming environment (read: No GPU programming experience needed) for today's GPUs. A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4. Combine this with the increased memory bandwidth, 25.3 GB/sec peak compared to the Pentium 4's 5.96 GB/sec peak, and you've got a seriously fast compute engine but programming them has been a real pain. BrookGPU adds simple data parallel language additions to C which allow programmers to specify certain parts of their code to run on the GPU. The compiler and runtime takes care of the rest. Here is the Project Page and Sourceforge page."
I suspect that this high performance is only attainable for the field the GPU is specialized for, i.e. graphics-related things. Or isn't it?
What kind of instructions does the GPU actually accept?
I mean, you probably just can't run any kind of algorithm on there can you?
The path I walk alone is endlessly long.
30 minutes by bike, 15 by bus.
I wonder how long till we see a (insert worthwhile cause here)-At-Home client that supports this?
... can you say 'software synthesists' wet dream?
... $5 to the first person to use Brooke to make a synthesizer. :)
Oh, suddenly, that 'game investment' also gives you a few 100 extra voices of polyphony?
Sweet
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4.
wait, if there is a technology that allows construction of GPU that is 3 times faster than the fastest CPUs, why Intel and AMD do not use this technology to build those 3times faster CPUs?
are you sure that you can compare the speed of GPU and CPU?
#
#\ @ ? Colonize Mars
#
I'm completely new to meddling with graphics card, so apologies if this is a silly question: when programs utilising the GPU for arbitrary calculations are running does the screen go weird, or is there a way of stopping the output being displayed? A screenfull of junk might not matter to a scientist leaving their computer to crunch numbers for a few months but it wouldn't be good for a general-purpose program.
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
It would seem to me that the GPU is not going to be as general-purpose as the CPU, but could still attain the high mathematical throughput with vector-oriented processing.
Doing string searches, complex logic analyses, etc. would probably suck, but big data manipulations, such as SETI-style wave transformations, molecular analysis, etc., might be able to take advantage of them.
Design for Use, not Construction!
I'd love to see an FFT implementation (maybe it's not so hard ... will have to download and play with it.)
A lot of scientific code is constrained by how fast you can do an FFT, perhaps of arbitrary size. And a fast graphics card is a lot cheaper than a high-end processor.
For embarassingly parallel vector problems, this is just the sort of thing for cheap, powerful clusters based around a cheap PC and a fast GPU.
1) Each character would have it's own shader program.
2) You would set the shader program, draw a rectange, and the character would appear.
3) The shader programs would be automatically generated by processing TrueType files.
To implement:
1) Break Truetype outline up into a number of convex curve segments.
2) Each of these curve segments would be represented as a set of constants in the shader program
3) For each pixel, test a line from pixel to an edge.
4) If the number of segments crossed is odd the pixel is black else white.
The algorithm can be refined to add antialiasing and hinting.
What you end up with is text that is clear at any resolution. The size of the text is controlled by the rectangle you draw it in. The text can also be clearly rotated and sheared.
An obvious optimization is to get the GPU vendors to add a shader instruction to do the calculation for which side of the bezier curve segment the current point lies.
While not important for games drawing text is critical for desktops. And we all know about the current trends to draw desktops with 3D hardware.
PCI-X can fix this data bus in other ways as well. Motherboards come with one AGP slot, but PCI-X can and will provide many expansion slots.
Picture five high end GPUs on the motherboard eclipsing the single high-end cpu for a fraction of the price. Intel and AMD would be forced to cut the asking price of their products to compete. We could finally see some real four-way competition for "processors".
TW
It's been done. The Havok game physics system is available for the Playstation 2, and the physics is running in the vector processors, where most of the PS2's compute power resides.
Collision detection isn't that CPU-intensive. (This may surprise people not familiar with the field. But it's true. If collision detection is using substantial CPU time, you're doing it wrong.) Correct collision resolution is where the time goes.
Physics code works better with double-precision FPUs. You need both dynamic range and long mantissas to do it well. Some of the game consoles, and most of the GPUs, only have single-precision FPUs. It's possible to make physics code work in single precision, but fast-moving objects that cover considerable distance may have problems.
I (and presumably others) have asked some project leaders about this, but it seems to come down to testing and support of various cards. Also, remember that this is relatively unknown technology - Amiga blitting aside ;-) - you have to be pretty sure it's going to give accurate and consistent results before using it seriously. Find-A-Drug was my project of interest, and they have a Linux version too.
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Researchers at Caltech and other institutions have been looking at this for about three years. See "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid" by Bolz, Farmer, Grinspun and Schroder (SIGGRAPH 2003), for example. The paper, illustrations, and movies are available from Dr. Grinspun's homepage. The primary problems with the approach at the time this work was done was the limited bandwidth of texture-related operations in OpenGL based upon improper assumptions in pipeline optimization.
Joseph R. Kiniry
http://kind.ucd.ie/~kiniry/
Lecturer
UCD School of Computer Science and Informatics
This cluster has 70 Playstations (one article said that they'd ordered 100, but only 70 are in the cluster... Obviously the others are being used for "research".)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Your music application sounds like fun. I didn't know anybody was still doing anything quite like that by 1990 - there was a whole range of people around John Cage's time who did lots of prepared piano stuff.
Some of the people who were trying to sell our multi-processor supercomputer flavor came up with a music studio application, doing lots of audio processing and mixing, sort of like your device turned inside out. Don't know if they sold more than one of them before the Lucent spinoff took them away.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks