Domain: gpgpu.org
Stories and comments across the archive that link to gpgpu.org.
Comments · 114
-
Large datasets are mostly IO limited
While cool and all 125million tweets with geo tagging is at most: 1250000000*142bytes = 165 GB. That is not what "big data" considers a large data set. Indeed most "big data" queries are IO limited. For around 16k USD you can fit that entire working set in memory. You are not really in the "big data" realm into you have datasets in the 10's of TB's compressed (100's of TB's uncompressed).
For these kinds of datasets, and where more compute is necessary there is MARs. -
Re:Running Linux in the CUDA Cores
This is not how the GPU cores work. There are some efforts to offload some tasks that kernel does and are suitable for GPU like block encryption etc, (in general everything that is parallel enough and can be streamed). For instance there's AES acceleration: http://gpgpu.org/2011/05/04/kgpu-gpu-computing-in-linux-kernel
-
Re:Unification under DirectX
CUDA was announced November 2006, so Microsoft wasn't that far ahead. But mass-market GPGPU really got started around 2000, culminating in the Brook project in 2004. Microsoft didn't start this trend, though they did jump on it quickly.
-
Re:GPU accuracy
Question: Since you seem to be pretty knowledgeable on the subject, have you or any of your colleagues used or tried the AMD Stream SDK? Because those ATi 5870s look to be pretty scary as far as raw power, and since the AMD SDK supports OpenCL on both the CPU and GPU, and AMD has opened up their code as well as supporting both Windows and Linux 32/64 bit I was just curious if you or anyone else here has tried it?
-
Re:More than just graphics
Go and have a look at GPGPU. There's tons of material on there about techniques, some tutorials and a busy forum.
-
Re:Kondratief cycles
I would say microcomputers have largely gone through their cycle.
You are very funny, dude.
When you look at this, you probably see an effing ugly gaming laptop. I see a massive supercomputer that you can throw in a bag, something capable of outshining anything CRAY had 10 years ago for millions of greenbacks.
The only thing is that there are no killer apps YET for a beast like this; when a killer app for something like this comes along, we are in for a thrilling ride.
-
Multicore = failure.
I championed it here but there is no software that utilizes it and programming for it is difficult as mentioned here in many articles.
New languages aren't being used to help out multicore or parallel processing with graphics chips.
A graphics chip computer built for gaming and general use would be amazing. It would cost as much as an entry level general chip using pc but could do 3d GAMES!
But would need parallel processing language.
-
Re:What a waste of resources
I think you're missing the purpose of what a graphics processing unit is for.
No, he's going beyond the purpose of what a graphics processing unit was originally for, and looking ahead to what General Purpose GPU computing is going to be for. There's nothing in the GPU that requires it to operate on polygons; the silicon is there to do parallel stream processing, and streams of geometry/texture/lighting/etc data are just one of the ways to use that.
-
Re:Anyone with more knowledge explain this to me
there is no code today that will use it explicitly, the whole paradigm of a GPU is that you do not read data back to the CPU.
Perhaps you should look into GPGPU and CUDA. Most of what most people do with computers involves one-way traffic to the GPU, but a small and sometimes well-funded subset of us have bigger plans than video games for the massive parallelization the GPU provides.
It will be interesting to see if the Nvidia/Intel and AMD/ATI alliances will kill progress in this direction and make us all wait for Intel and AMD to figure out a way to market 256 threads of execution to consumers who won't ever need it, but perhaps it will bring about innovations that remove todays bottlenecks, such as host/device bandwidth instead.
-
Fascinating
I think this part of the computing timeline is going to be
one that is well remembered. I know I find it fascinating.This is a classic moment when tech takes the branch that
was unexpected. GPGPU computing will soon
reach ubiquity but for right now it's the fledgling that is being
grown in the wild.Of course I'm not earmarking this one particular project
as the start point but this year has gotten 'GPU this' and
'GPGPU that' start up events all over it. Some even said
in 2007, that it would be a buzzword in 08.
And of course there's nothing like new tech to bring out
a naysayer.Folding@home released their second generation
GPU client in April 08. While retiring the GPU1 core in
June of this year.I know I enjoy throwing spare GPU cycles to a distributed
cause and whenever I catch sight of the icon for the GPU
client it brings the back the nostalgia of distributed clients
of the past. [Near the bottom].I think I was with United Devices the longest.
And the Grid.Now we are getting a chance to see GPU supercomputing
installations from IBM and this one from MIT.
Soon those will be littering the Top 500 list.I also look forward most to the peaceful endeavors the new
processing power will be used for... weather analysis,
drug creation, and disease studies.Oh yes, I realize places like the infamous Sandia will be using
the GPU to rev up atom splitting. But maybe if they keep their
bombs IN the GPU it'll lessen the chances of seeing rampant
proliferation again.Ok, well enough of my musings over a GPU.
-AI
-
Re:Why haven't they started releasing GPU CPUs yet
Check out the GPGPU (General Purpose GPU) project:
http://www.gpgpu.org/ -
Re:So, what's actually accelerated here?
The GeForce 8 series is perfectly capable of being used to accelerate physics calculations and pass the results back to the CPU. The PCI-Express bus is bidirectional, and the scheme for getting results back from calculations done on the GPU is essentially this: encode your inputs as Texture1, set the render target of your program to Texture2, use a shader to calculate the results and "render" them (drawing them to Texture2), and pull Texture2 back to system memory or reuse it for further calculations. I suggest you look into nVidia's CUDA and, for more general information, check out GPGPU.org.
Note that with CUDA you can avoid many of the headaches induced by "normal" GPGPU programming (ie, the OpenGL/DirectX coding involved in the scheme described above), but you limit yourself solely to the GeForce 8 series, whereas GPGPU programming using GL/DX offers access to a wider range of hardware. -
Re:Open Video Drivers
I was wrong about nVidia - the VIA video chips are S3. But still, the GPU can be more powerful than the CPU. Though programming GPU for all the tasks of a CPU is hard, it can be done, which is why there's interest in General Purpose Graphics Processing Units (GPGPU).
-
Re:I remember a time...Why do you need an advanced GPU on your server?
So you can run GPGPU.
-
Re:Four graphics cards!Has gcc been ported to a GPU yet? Can you compile kernels (or Gentoo) on your video card? It looks like there might be some work ongoing in this area, yes.
-
Not so new but still neat.
This project has been around for a long time: http://www.gpgpu.org/ Though I agree modern GPU's are even more useful for general purpose computing.
-
Re:Hurrah
Games, schmames. If we have complete specs of the hardware, there are plenty of things besides graphics we can do with it.
-
Re:Linux gaming arena?
And then there's the whole GPGPU segment, which is gaining ground in the scientific computing arena:
http://en.wikipedia.org/wiki/GPGPU
http://www.gpgpu.org/
The absence of good drivers for ATI hardware meant that on GNU/Linux nvidia was the only available choice. Better ATI drivers could open this up. -
Re:The Cowardly Lion says..........
The cleaning analogy is perfectly apt!
If 100 people cleaned your house, they "wouldn't get shit done".
If 100 people cleaned Prof. Vishkin's house, they would be finished in about 3 minutes.
How this is better than Intel's 80-core processor remains to be seen. This "technology" looks like it's an overhyped version of GPGPU or PhysX. -
Re:Of course no RSX...
Modern graphics hardware is actually quite programmable. Although this RSX is based off of the G70, not the G80 (which is essentially a specialized stream processor), you can probably do some interesting General-Purpose GPU things on it.
-
Why on earth would you want to?
Just buy a PS3 instead. Sony are happy to heavily subsidise the hardware for you, and won't even complain if you don't buy any games or movies for it.
Or, better still, port your signal processing code to a GPU instead. They're much cheaper and far more powerful than a Cell, and with far more local memory bandwidth too. GPUs aren't ideal for every algorithm, but they do work well for many forms of signal processing.
-
General Purpose Programmers
it's hard to program such GPUs for anything other than graphics applications.
"Anything other" is "general purpose", which they cover at GPGPU.org. But the general community of global developers hasn't gotten hooked on the cheap performance yet. Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips. -
Re:Never thought of that
Check out this web site: http://www.gpgpu.org/
It is up to date and contains a lot of related information.
WP -
Re:My money is on NVidia
I suspect the real problem is because high end cards are starting to push Shader unification.
From a chipset standpoint, Intel actually makes decent (not spectacular, but better than many) graphics hardware already, they just don't have hardware transformation and lighting (T&L), which gets offloaded to the CPU. That means you can't be throttling your CPU(s)/cores and need a decent pipe between the hardware and memory. Intel said a couple of years back that it's a myth that the bottleneck is usually in T&L and the problem is actually pixel throughput.
As far as I can tell, that means
a) the bottleneck is between geometry (T&L) and shading (pixel hardware), meaning it's because of the software driver.
or
b) the bottleneck is between shading and the display, meaning Intel's hardware is too crappy to push that many pixels.
The first is a meh (no surprise - it's caused by having geometry in software) the second would be a hardware issue Intel needs to resolve to work with larger displays.
Now back to Shader Unification - basically, if companies like nVidia and ATI move to unified shaders they can assign the types they need as needed and not leave many of them idle. Both of those companies have experience in unified shader architectures already (i.e. the Xbox, and GeForce 8 series), so it wouldn't surprise me if this were the trend of the future. Intel needs to move their software T&L into hardware to create a unified architecture - assuming that is the way of the future.
Another issue is that unified architectures are basically high speed generalized floating point units - these have practical uses in other areas besides graphics (physics, supercomputing, even databases - there are even web pages like this one dedicated to it). Intel has to see this as a threat and know that they need a response should their main competitor, AMD (ATI), go in that direction. -
Re:Yes
This is a false argument. Even if you find two things which are not dependant on each other, it does not follow that there are no dependancies in the entire game. It's pretty damn obvious that the video and sound output DO have a sequential dependency on the controller input.
No the argument is correct. Although there are lots of sequential dependencies in the game, there is also lots of code that can be parallelised. The parallelisation is limited by the length of the longest sequential chain - this is another phrasing of Ahmdahl's law that the OP quoted. Everything within a timeslice (say a frame) will have a dependency on the IO, and the final output of the frame and sound will be dependent on everything in the timeslice, but this is not a large chain. One frame is typically ouput while the next is being computed (pipelining) - so the two tasks can be performed in parallel if you accept a one-frame latency. This is what most games do when they double-buffer the graphics. So then you are only concerned with the dependencies *within* a timeslice. Is the physics code, the graphics code, the AI code or the sound code dependent on each other within a timeslice? No. Read the article on Steam / Halflife Episode 2 sometime, it explains this coarse-grain parallelism well.
Well actually they do, or at least you have to assume they do until you've checked otherwise. Or are the bullets passing through the enimies with no effect? Sounds like a pretty silly game to me.
Now you've slipped back to fine-grain parallelism. Here's a clue: do I need to check the bullet against *every* object in the game? Wouldn't that be a bit silly? How about nearby ones only? Oooh, this would need some kind of spatial data-structure like an oct-tree. Then lets split the timestep into two-phases, on the first phase we'll compute all interactions between objects that are well within their spatial sub-division. We'll then broadcast these changes, and then process the areas around intersections. Gosh, a sequetial dependency chain of two steps - that's a constant by the way. And potential parallelism for as many processors as we have sub-divisions (or buckets if you will) in the spatial structure. This is why physics code can be parallelised well.
The comment about electrostatics applies because if you are considering non-local fields of interactions then this approach is borked and you need to consider n-squared interactions. But even then you can simulate in discrete timesteps and trade latency for potential parallelism.
In summary - there are no inter-module sequential dependencies because of double buffering (this is called ping-ponging in the GPGPU community). The intra-module sequential dependencies can be minimised by grouping spatial regions into a single calculation, although as Valve have pointed out this is difficult. And in fact there is 30 years of research into how to solve this problem. For a working example have a look at GPGPU where there is a paper showing a million particle simulation working in parallel on a GPU. -
Workshop and Tutorials at SC'06
While it's probably too late to sign up for the general-purpose GPU tutorial at Supercomputing '06, there may still be time to get to the "General-Purpose GPU Computing: Practice and Experience" workshop (assuming you're going to Supercomputing to begin with.) Workshop's web page is http://www.gpgpu.org/sc2006/workshop/
The workshop itself has turned into a kind of "GPU and multi-core" forum, with lots of great speakers. NVIDIA's Ian Buck and ATI's Mark Segal will both be speaking to the Wired article's material. And IBM and Los Alamos will be talking about Cell and Roadrunner, among other things.
</shameless plug>
So, I wonder what Dinesh Manocha will be talking about at the workshop... Hmmm.... -
GPGPUs
To the guy who utilized pixel shaders as threads for his 'thesis': when you do research, one of the most important steps is background research (related works). Ex: anything on http://www.gpgpu.org/ should have provided a cue for original work. Von Neumann machines are out, they clearly don't scale. Scheme taught us that local mutation may be ok, but global not so much. GPU coding exemplifies this assumption and, more interestingly, make hpc apps accessible
:) I hope to work on this stuff next year, assuming funding :) -
Re:Two words: closed architecture
That's not necessarily true. It is a relatively new field of computer science, and thus there's not all that much info out there yet. But once you understand the basic concepts of general purpose GPU programming anyone can do it.
What's most likely is that the guys at Stanford started pushing the hardware to the limit, and in ways the driver developers might not have anticipated. Probably what they ran up against was bugs in the driver, and the help came from ATI in terms of ways to work around the bugs. Evidence backs this up from Folding@Home's GPU FAQ:
[You must use] Catalyst driver version 6.5 or version 6.10, but not any other versions: 6.6 and 6.7 will work, but at a major performance hit; 6.8 and 6.9 will not work at all.
Your next question might be, if that's true then why use ATI (who are known for poor driver quality)... it might simply be a matter of that's the hardware they had to test with, so that's what they needed to use.
At any rate, it's definitely possible to get started doing GPU programming without vendor support.
There's even some API's out there to help... The Brook C API (for doing multiprocessor programming) has a GPU version out called BrookGPU: http://graphics.stanford.edu/projects/brookgpu/ind ex.html
There's even a fairly large community of people using Nvidia's own Cg library for doing general purpose stuff.
There's also GPUSort (source code available to look at), which is a high performance sorting example that uses the GPU to do the sorting, and it trounces the fastest CPUs: http://gamma.cs.unc.edu/GPUSORT/results.html
And last but not least there's the GPGPU site that is a great resource for all sorts of general purpose computing the GPUs: http://www.gpgpu.org/ -
Re:GPU code samples?
-
GPGPU primer
(Full disclosure: I work for a major manufacturer of 3-D accelerators.)
There's lots of good sites that talk about GPGPU. Wikipedia has an okay article on the subject as well, and NVIDIA has a primer (PDF) on the subject. But the summary of this article is a bit overly broad.
GPGPU isn't about moving arbitrary processing to the GPU, rather it's about moving specific, computationally expensive computing to the massively parallel GPU.
Effectively, the core idea of GPGPU solutions is that you compute 256x256 (or another granularity) of solutions entirely in one pass.
NVIDIA has several examples on their website, specifically the GPGPU Disease and GPGPU Fluid samples. The Mandelbrot computation they have there could also be considered an example. (More samples here).
GPGPU has already been utilized to perform very fast (comparable to the CPU) FFTs. In an article in GPU Gems 2 (a very good book if you're interested in doing GPGPU work), they indicate that a 1.8x speedup can be had over performing FFTs on the CPU. I've heard that there are now significantly faster implementations as well. -
Video card related question
With the advent of video cards that are Turing complete in recent years and sites such as this, how feasible is it to run an actual operating system on the video card itself? It seems like it would be possible to write a kernel as a shader program, upload it, and just have it run.
-
GPUs already are "computers on a chip"
GPU shader processors certainly are Turing complete and there are plenty of people (ab-)using them for general purpose calculations. See for example http://www.gpgpu.org/. For some types of calculations, GPUs are much faster than CPUs due to their massively parallel processing. In fact, I have written my thesis on that very topic, comparing CPU and GPU based implementations of some algorithms.
-
Video/FX/image processing: nope
HD video (1920x1080, rgb, 30fps) is 186 megabytes per second. Processing that with an interpreted or even JIT language is going to be many orders of magnitude slower than hand-coded C, or GPGPU, or whatever the flavor of the month is. Yes python and ruby are super productive. But my time writing the code is only spent once; the poor schmucks out there using it will be using it every day for years.
People doing HD postproduction really care about every cycle per pixel because the factors are so large: if you can turn a overnight render into a lunch-hour one, or lunch hour into a coffee break, or coffee break into near real time, that matters a lot. That time to the video guy is what the edit-compile-debug loop cycle time is to the programmer.
--ST -
Re:"the right daughterboards"The parent poster writes: Be careful of that seemingly innocuous qualification: "with the right software and daughterboards"... both imply serious limitations to the technology....Even with a reasonably fast processor (say 3 GHz) today, you are typically only be able to process, at most, a few million samples per second -- especially if you are performing complicated modulation/demodulation, coding/decoding, filtering and protocol processing. Each sample may require substantial computation, and that limits the number of samples you can process per second. That, in its turn, affects the bandwidth that a processor can address (i.e. how wide a part of the radio spectrum you can "see" at any one time).
I'll bet it's not long before the USRP/GnuRadio people hook up with the graphics card as a compute engine folks. Graphics cards are well suited for high-speed signal processing, and would give you the ability to process high-bandwidth signals in realtime even on an ordinary PC.
GPGPU: General-Purpose computation on GPUs
The FFT on a GPU
GPU-FFTlib - Graphics Card based Implementation of the Fast Fourier Transform
--Pat
-
Re:main memories read speed is 25GB/s
But, with GPGPU techniques, read-back from the screenbuffer is becoming an increasingly popular way of doing fast computation using the massive parallelism that GPU's offer.
-
Hurry Up and Wait
With so much of the highest-level CPU design going into GPUs, and so many of the most wily consumers of the fastest GPUs going to any lengths possible to trick them out, I'm surprised there's not a lot more development of GPGPU, harnessing these processors for general purpose computing.
Given the qualifications and interests of that joint community, I'd expect to see a "PCI network" that parallelizes MP3 encoding on much cheaper MFLOPS GPU HW by now.
Maybe actually playing the games is eating up too much time. -
Re:A CPU for GPGPU?
You are 100% on the money with your point. Intel is currently scared to death of paradigm changing technology like CELL and with all 3 major consoles switching to a POWER cpu derivative we are seeing that Intel's only reason for dominance is its hold on the x86 market. Even though their flash, chipsets, mobile, and complimentary platform startegy is very comprehensive and on sound footing, the last thing they want is for the GPU to become the primary dictator for performance improvements. At present the best GPUs are CPU limited, and with 12-18 month product lifecycles, it is going too slow for certain industries. The advent of GPGPU and libraries like GPUFFTW, people are realizing performance boosts that beat Moore's Law. Merom is going to be very competative with AMD's roadmap for the next few years, but if AMD wants to come out way over Intel, integrating certain GPU functions into CPU will give AMD a huge edge(understatement). Additionally this type of combination can be compounded in terms of peformance if we factor in dual-core and multi-core CPUs in the next few years. The whole reason CELL was invented was so that PS3 could overcome Moore's law, the performance just wasn't there in the timeframe needed. Perhaps AMD-ATI could be an alternative since we all know that CELL is not nearly mature enough for general purpose computing due to the lack of an efficient software development tools.
I can't really say why ATI instead of Nvidia though, my guess would be Nvidia is too big and overlapping with its acquisition of ULi. Another possibility is that ATI's solutions are more low-power in design. If you've noticed, Nvidia solutions have been power hungry since they started using Voodoo technology.
http://hardware.slashdot.org/article.pl?sid=06/05/ 29/1424213
http://www.gpgpu.org/ -
Re:The Windowing ProblemHere's some abstracts about other GPGPU techniques that could be relevant:
GPGPU Image And Volume Processing...Fourier volume rendering directly on the GPU. The paper presents a novel implementation of the Fast Fourier Transform: This Split-Stream-FFT maps the recursive structure of the FFT to the GPU in an efficient way. Additionally, high-quality resampling within the frequency domain is discussed.
*** ...an energy functional is successively minimized in a variational setting. The gradient flow formulation makes use of a robust multi-scale regularization, an efficient multi-grid solver and an adaptive time-step control.
*** ...fast GPU algorithm to perform the discrete wavelet transform featuring flexible boundary extension schemes, flexible wavelet kernels, Cg shader implementation, and high precision....The beauty of the method is that both forward and inverse wavelet transforms are unified using position-dependent filtering and convolution and an indirect addressing technique.
*** ...details of implementing wavelet decomposition and reconstruction using graphics hardware, and develop a scaled version of wavelet analysis that constrains data to the [0,1] range of fixed-point frame buffers.
***
Accelerating 3D Convolution using Graphics Hardware
GPGPU scientific computing...details and microbenchmarks the use of pairs of native precision values to obtain higher accuracy results using DSP, SWAR, and GPU hardware. It also dicusses a way to speculatively use lower precision, recomputing with higher precisions only when accuracy constraints are not met.
*** ...describes a preliminary algorithm to achieve double precision results by adding a CPU-based defect correction to iterative linear system solvers on the GPU. We demonstrate that identical accuracy as compared to a full CPU double precision solver is possible while still gaining a factor of 2 in speedup compared to a highly tuned cache-aware CPU reference implementation in double precision.
*** ...developed a library generator for graphics hardware, that can automatically generate high performance matrix multiplication with comparable performance to expert manually tuned version on various graphics hardware platforms.
much more on numerical methods is also here. -
Re:The Windowing ProblemHere's some abstracts about other GPGPU techniques that could be relevant:
GPGPU Image And Volume Processing...Fourier volume rendering directly on the GPU. The paper presents a novel implementation of the Fast Fourier Transform: This Split-Stream-FFT maps the recursive structure of the FFT to the GPU in an efficient way. Additionally, high-quality resampling within the frequency domain is discussed.
*** ...an energy functional is successively minimized in a variational setting. The gradient flow formulation makes use of a robust multi-scale regularization, an efficient multi-grid solver and an adaptive time-step control.
*** ...fast GPU algorithm to perform the discrete wavelet transform featuring flexible boundary extension schemes, flexible wavelet kernels, Cg shader implementation, and high precision....The beauty of the method is that both forward and inverse wavelet transforms are unified using position-dependent filtering and convolution and an indirect addressing technique.
*** ...details of implementing wavelet decomposition and reconstruction using graphics hardware, and develop a scaled version of wavelet analysis that constrains data to the [0,1] range of fixed-point frame buffers.
***
Accelerating 3D Convolution using Graphics Hardware
GPGPU scientific computing...details and microbenchmarks the use of pairs of native precision values to obtain higher accuracy results using DSP, SWAR, and GPU hardware. It also dicusses a way to speculatively use lower precision, recomputing with higher precisions only when accuracy constraints are not met.
*** ...describes a preliminary algorithm to achieve double precision results by adding a CPU-based defect correction to iterative linear system solvers on the GPU. We demonstrate that identical accuracy as compared to a full CPU double precision solver is possible while still gaining a factor of 2 in speedup compared to a highly tuned cache-aware CPU reference implementation in double precision.
*** ...developed a library generator for graphics hardware, that can automatically generate high performance matrix multiplication with comparable performance to expert manually tuned version on various graphics hardware platforms.
much more on numerical methods is also here. -
Re:Any 64 bit GPU's?
No, there's no 64-bit GPU and as games don't really need them I doubt that there will be any for a long time..
But in some case depending of your application the GPU could still be useful, see http://www.gpgpu.org/cgi-bin/blosxom.cgi/2005/08/2 2 ,unfortunatly the linked article is not available anynmore but it used some iterative algorithm to gain a 2 times speedup with a GPU at double precision. -
Re:Here's the problem with this
yes, and with newer GPUs, the program size has increased dramatically, which makes them much more versatile. 3 years ago I was cramming a vertex program into 256 lines - now I've got 65535. Fragment programs increased similarly (though I just finally got a card that supports them in the last 3 months to play Oblivion, so I'm still learning the ropes).
Take a look at the GPU based samples (unfortunately, most require Windows) - many are incorporating physics (water, cloth, etc). Another good source is http://www.gpgpu.org/
Unfortunately, I don't know of any open source GPU based physics engines, which sucks, and IIRC, a bunch of patents have been filed on some of the software based solutions. -
Re:Is that what I think it is.
That makes me wonder: is the chess algorithm suitable for running on a GPU, or even possibly this physics chip (i.e., this kind of thing)?
-
Re:Come on
Something like General-Purpose computation on GPU's?
-
Re:Before people get too excited...
Uh...that's what one would think. But in reality, the readback performance is only between 450MB/s (OGL) and 900MB/s (DX), nowhere near the limit of the PCIE bus (you can check the GPGPU forums for these numbers). This is actually only about 2X faster than in the AGP 8X days.
IIRC, as it stands, uploading to the graphics card is about 4X as fast as downloading from the graphics card. So yes, GPU->CPU is still a performance killer, contrary to what you think or believe. (for your reference, here's a quick link to one of the posts, which is agreed upon from some of the site admins: http://www.gpgpu.org/forums/viewtopic.php?t=2092&h ighlight=read+bandwidth) -
Forget 'physics' - give me a good math API
The guys over at http://www.gpgpu.org/ have been doing various math calculations, including 'physics' on GPUs for a while now. One big problem is that the only real API is OpenGL. So not only do you have to be a smart math programmer (which is pretty rare to begin with) but you also have to understand graphics programming too and then figure out how to map traditional math operations onto the graphics operations that OpenGL makes available. It isn't that hard to do simple things like matrix math, but trying to really optimize it for really good performance requires almost wizard-level understanding of OpenGL and the underlying hardware implementation.
The cards' math capabilities would be so much more accessible (and thus used by so many more programmers) if Nvidia (and ATI) would come out with standard math-library interfaces to their cards. Give us something that looks like FFTW and has been tweaked by the card engineers for maximum performance and then we will see everbody and his brother using these video cards for math co-processing. -
Re:Math coprocessor?
Yes, that's what GPGPU programming is all about (General Purpose GPU programming). See here for lots of info.
-- ST -
GPGPU
Remember that those "graphics cards" are high-performance processors that can perform more "general purpose" tasks: GPGPU. I'd love to see a Linux kernel that is basically just a task scheduler for distributing computing among a network of GPGPU cards on these multiple PCI buses. Scalable desktop supercomputers running Linux apps.
-
Re:AGP is a port, not a bus.
Better hurry and tell these guys: http://www.gpgpu.org/
-
Re:There was this project ...
GPGPU is what you're looking for.
-
Re:More cores are cool but are not the solution
GPUs can be used for all sorts of scientific processing.
Here Mike Houston talks about using a GPU for scientific calculations.
http://graphics.stanford.edu/~mhouston/public_talk s/R520-mhouston.pdf
also see
http://www.gpgpu.org/