Nvidia Physics Engine Almost Complete
Nvidia has stated that their translation of Ageia's physics engine to CUDA is almost complete. To showcase the capabilities of the new tech Nvidia ran a particle demonstration similar to Intel's Nehalem demo, at ten times the speed. "While Intel's Nehalem demo had 50,000-60,000 particles and ran at 15-20 fps (without a GPU), the particle demo on a GeForce 9800 card resulted in 300 fps. In the very likely event that Nvidia's next-gen parts (G100: GT100/200) will double their shader units, this number could top 600 fps, meaning that Nehalem at 2.53 GHz is lagging 20-40x behind 2006/2007/2008 high-end GPU hardware. However, you can't ignore the fact that Nehalem in fact can run physics."
...provide Linux drivers, or will the F/OSS community have to reverse-engineer this one?
Anyone care to explain why there's such a big difference between a GPU and a CPU? I keep hearing how GPU's are this and that much faster than a CPU at calculations like graphics, physics and such, so naturally I assume there's a big difference that makes us still chose the x86 and x64 CPUs as the main processors of a PC. What are the limitations; why can't just the libraries be ported for GPUs instead of CPUs and why don't we then just run all calculations on a GPU, if they are anything from 2 to 50 times faster?
:)
It just seems to me that if a graphics card can calculate physics then it would also be able to do pretty much all the same types of calculations that a regular CPU can do, but I am obviously missing a big part of it.
Experts, continue!
They should test physics systems with spheres on irregular ground, with uneven μ (coefficient of kinetic friction), and changing wind.
Those are the kind of problems that force programmers to use approximations when using a physics engine.
The next step is really abstracting the physics from the development, not having pretty water.
That's exactly why I'm skeptical of CUDA in general. Rendering graphics is one of those "embarrassingly parallel" operations, where for every frame rendered, each triangle, vertex, and pixel is independent of the others. I don't know much about physics processing, but I'm guessing it's the same. Simply throwing say, 4x more hardware at the problem gives you pretty much the ideal 4x speedup. Once you introduce random data dependencies into the problem, though, parallelization is much harder and you don't get the same speedups.
There are a lot of easily-parallelizable problems out there, which is great...but then again, there are a lot of not-so-easily-parallelizable problems out there too. I don't think CUDA would do any better than a general-purpose CPU on the latter class of problems.