BrookGPU: General Purpose Programming on GPUs
An anonymous reader writes "
BrookGPU is a compiler and runtime system that provides an easy, C-like programming environment (read: No GPU programming experience needed) for today's GPUs. A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4. Combine this with the increased memory bandwidth, 25.3 GB/sec peak compared to the Pentium 4's 5.96 GB/sec peak, and you've got a seriously fast compute engine but programming them has been a real pain. BrookGPU adds simple data parallel language additions to C which allow programmers to specify certain parts of their code to run on the GPU. The compiler and runtime takes care of the rest. Here is the Project Page and Sourceforge page."
but the link to the project page is correct.
Actually, since "graphics-related things" are all matrix operations, this would turn the GPU into a high-end vector (matrix) engine.
The dogcow says "Moof!"
No. GPU's are basically matrix crunchers for vector calculations. You can make them do other stuff, probably, but it'd be about as efficient as emulating an Amiga on a x86.
The keywords are:
A shader program
The GPU is designed for CG, not for 'general purpose computing'.
I guess the instruction set is pretty limited too.
The path I walk alone is endlessly long.
30 minutes by bike, 15 by bus.
You can compare there ability to run shader programs (see the example given).
It does not mean you can use the GPU as a general purpose prossessor effectivly, or that it is even turing complete.
All it means is that certain types of programs could possibly run 3 times faster if ported to this system.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
The CPU is a general purpose computing device. GPU is a specific purpose computing platform.
Think about the playstation2 were it's 300mhz cpu that outperforms a 700mhz or higher pentium cpu in graphics performance, but also would run a wordproccessor at a speed that would be slow compared to a 486.
I bet that if you ran a webbrowser on the same GPU it would run just a bit faster then on a 286 computer.
I guess you mean in the C1541 floppydrive.
English is not my first language, so cut me some slack -: Om du kan lasa det har sa kan du Svenska
Its probably because the Pentium 4 needs to be more generic. It needs to support a far greater number of instructions.
A GPU on the other hand can do only so much. But its strength lies in areas where the CPU lags. Fast memory interfacing, extreme parallelization etc.
Now there exist cmoputing problems that can be solved very efficiently on the GPU, even with its limited instruction set. This is what this project is all about - to provide a generic programming language that compiles to a vertex/pixel shader that runs on the GPU, but does non-graphics tasks. awesome!
Don't Panic
Reminds me of the good old days when you used the processors in the C64 tapedrive to compute stuff. Wouldn't want to waste those precious cycles.
Actually it was the old 1540/1541 and later 1571/1581 disk drives. The tape drive did not have a processor in it.
any algorithm will run (memory allowing), but not any instruction; The card runs on some flavor of assembly, as any microprocessor does, and with this tool you can compile code for the GPU from C and it gets loaded into the GPU when your main program runs on the CPU.
You are assuming using the GPU technologies are possible in a CPU. Because something is applicable in one instance doesn't mean it is in all instances. Making some things efficient may take away from the efficiency of others, but in the case of such aa specialized chip, it may not matter.
:)
It may be ok to compare the speed of a GPU and a CPU if they are infact different. If a GPU was a CPU used with cheaper material, yeah, it would be unfair. But as life goes, they both have their merits.. so why not? A GPU is prolly best at some matrix math transforms.. or not.
--
"I'm not bright. Big words confuse me. But Wanda loves me and that should be enough for you." - Cosmo
Nope. Nothing appears on your screen until the contents of the area of memory known as the "frame buffer" are rewritten by a program (on either the GPU or CPU). The GPU can execute math code all day and you won't see the results unless it deliberately modifies the frame buffer.
Brook is an extension of standard ANSI C and is designed to incorporate the ideas of data parallel computing and arithmetic intensity into a familiar and efficient language. The general computational model, referred to as streaming, provides two main benefits over traditional conventional languages:
- Data Parallelism: Allows the programmer to specify how to perform the same operations in parallel on different data.
- Arithmetic Intensity: Encourages programmers to specify operations on data which minimize global communication and maximize localized computation.
More about Brook can be found at the Merrimac web site which contains a complete specifications for the language.The BrookGPU compilation and runtime architecture consists of a two components. BRCC is the BrookGPU compiler is a source to source metacompiler which translates Brook source files (.br) into
The BRT is an architecture independent software layer which implements the backend support of the Brook primatives for particular hardware. The BRT is a class library which presents a generic interface for the compiler to use. The implementation of the class methods are customized for each hardware supported by the system. The backend implementation is choosen at runtime based on the hardware available on the system or at request of the user. The backends include: DirectX9, OpenGL ARB, NVIDIA NV3x, and C++ reference.
...but I assume that in any advanced texturing/shading/bump mapping/other GFX function rendering, you apply all the different effects, and when you're done, specifically call that the frame is to be displayed on screen. (E.g. why your FPS != your monitor refresh rate)
I would assume that this program simply never calls the drawing function, but instead gets the results back from the GPU. The normal screen should be able to run in the meanwhile (I assume you can e.g. build a 3D environment while showing a 2D cutscreen), so I would think you can have a plain GUI, as long as it doesn't need to use anything advanced.
Kjella
Live today, because you never know what tomorrow brings
www.gpgpu.org
Very cool. Vector/Graphics processors could one day overtake General processors. They are way more energy efficient too.
Because CPUs are limited to running instructions (for the most part) in serial. GPUs get to run a large number of instructions in parallel. As some above posts mentioned, a lot of the stuff the GPU can do is vector and matrix multiplication, therefore the GPU is really good at multiplying a lot of numbers times a lot of numbers at once. But in everyday life you aren't multiplying a bunch of number times a bunch of numbers at once, you are multiplying one number time another, then multiplying the result times a number, and so on. GPUs are built to a specific task, and at that task they are very fast, but outside that task they won't be able to compete with a real CPU. And on top of all of that I can buy 3 2.4Ghz P4s for the price of a Geforce FX5950.
> I mean, you probably just can't run any kind of algorithm on there can you?
Probably. I should imagine it has local storage with the corresponding fetch and store instructions, basic math, and ability to jump to arbitrary points in the shader program, which makes it very much turing complete. Everything else is a matter of a compiler backend. Bus latency would be an issue, so it'd be painful for programs that need a lot of I/O, but that's not an issue for a lot of programs.
I've finally had it: until slashdot gets article moderation, I am not coming back.
http://www.cs.unm.edu/~kmorel/documents/fftgpu/
The FFT on a GPU
This page contains supplemental material for the following paper.
Moreland, K and Angel, E. "The FFT on a GPU." In SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, pp. 112-119, July 2003.
Shades of programming the Amiga Blitter. I think Dave Haynie had Life running at 60FPS in about 1986-87 on a 68000.
think fx not synth... just use it as a bad-ass real time convolver, and _then_ get wet.
isn't it much more interesting to do things that were not possible before, than to just do the some thing, but in increased quantity? Also convolution is the single most universal operation in audio dsp (fir filters, reverb), one well-built plugin would suffice for everything. synth development creativity would certainly suffer from the increased development costs.
[i have an opinion and i am not afraid to use it]
wait, if there is a technology that allows construction of GPU that is 3 times faster than the fastest CPUs, why Intel and AMD do not use this technology to build those 3times faster CPUs?
are you sure that you can compare the speed of GPU and CPU?
Well, yes and no. In the same way you can take a render farm and say that "this provides the equivalent of a 100GHz Pentium" Which might be true, for that specific task. You see it already between GPUs, compare Pentium, Xeon, Athlon XP and Athlon 64. Do you get one benchmark "X is 3% faster than Y"? No. Faster at some, slower at others. For a specific benchmark, the difference can be pretty big already among "general" processors.
A specialized processor like a GPU will show much greater variation. It might really shine on some, really suck on others. Which is why it's no good using a GPU as a CPU. Those numbers tell you that it can be much faster than the fastest CPU around. Or better yet, if you can make it run in parallell to the normal CPU, give you a total performance which may theoretically be about 13GHz (10 + 3), where 3 of those can be general-purpose operations. Or it may be a task the GPU runs like a dog, and isn't even worth the overhead.
Kjella
Live today, because you never know what tomorrow brings
Here is a Beyond3d link that has some opcode info. Look around their site for a NV30 vs R300 architecture document that has lots of great stuff. If you are looking for the best s/n ratio, Beyond3d is one of the best. All meat, little fanboyism.
Nvidia has this already!
l
"About Cg The Cg Language Specification is a high-level C-like graphics programming language that was developed by NVIDIA in close collaboration with Microsoft Corporation. The Cg environment consists of two components: the Cg Toolkit including the NVIDIA Cg Compiler Beta 1.0 optimized for DirectX(R) and OpenGL(R); and the NVIDIA Cg Browser, a prototyping/visualization environment with a large library of Cg shaders. Developers also have access to user documentation and a range of training classes and online materials being developed for the Cg language."
http://www.nvidia.com/object/IO_20020612_7133.htm
Same AC as parent - happy to have discovered this
"Fixed In This Release (12/19/03) * nv30gl backend compiles and runs on Linux. Requires Linux cgc compiler from NVIDIA and the latest drivers.
NVIDIA's parts are OK, precision wise. You get IEEE floats, more or less. ATI's parts don't quite get you there are the moment, but their next series is planned to.
2.1 GB/s is very nice, but it only refers to transfers in one direction: to the card. There is a (much) smaller bandwidth back to the motherboard. This is because for their designed purpose, graphics cards do not need to talk back to the system much, they just crunch the numbers and spit out the results to a monitor.
With encryption you are usually looking at processing streams of data. If your encryption method involves a lot of floating point math (almost never) on every bit of information, then it would be nice. But encryption is almost always integer based (GPUs don't' shine in integer like they do in floating point), and involves just as much data going in as coming back.
If you are looking for a great (co) processor for integers, look at the Altivec section of the G4 (and the similar one in the G5.. I forget the IBM name).
Yes, the AGP -> main memory transfer rate of most video cards is abysmally slow, because it's not something that's needed for gaming. Maybe newer cards have changed, but I don't see why they would. background article
I thought the real reason to get a *professional level* card is to get a guarantee of reliability
Well, ISV certification - a CAD vendor will assert "with this card, our software produces no rendering artifacts".
http://portal.acm.org/citation.cfm?doid=566654.566 640l n line.siggraph.org/2002/Papers/13_GraphicsHardware/ purcell.ppt
http://www.theregister.co.uk/content/54/25312.htm
http://online.cs.nps.navy.mil/DistanceEducation/o
Interesting, at least as GPU is realy a sort of DSP (Digital Signal Processor). And as i am deeply into both Audio and Brodaband signal processing hardware systems development, i find using those chips on the high performance video cards to be extremely useful in processing waveforms using the base of all of it - matrix calculations. It allows both FFT, iFFT, (of course DFTs and DCT and so on), as well es QMF, PQF filtring and synthesys.
I could dream to do hi-fi vocoder out of video card - crasy but interesting! =)
From the other hand, i think a little bit sceptical about all this, as it will not work even at half or third of its gflops performance, when not used for the "native application". This means it could, after some time and hellish efforts, show that "PAIN vs BENEFIT" ratio falls more and more to the pain side.
I remember times i tryed to use 6510 cpu+8kbyts (dont remember exactly) ram inside c64 disk drive to process graphics in parallel with main pocessor. Efforts fell
And from the thrid point of view -- see how intel processors suck (~flamebait;) - "price>performance" like allways. Any small embedded chip outrivals it.
p.s.
Still hold on for the coming of the FPGA
Nope.
Those are 4-component (RGBA) types, with 32, 16, and 24 bits per component, respectively.
None of them are enough for double floats, and none of them are good enough for 80-bit reals that x87 uses.
Education is the silver bullet.