Five Nvidia CUDA-Enabled Apps Tested
crazipper writes "Much fuss has been made about Nvidia's CUDA technology and its general-purpose computing potential. Now, in 2009, a steady stream of launches from third-party software developers sees CUDA gaining traction at the mainstream. Tom's Hardware takes five of the most interesting desktop apps with CUDA support and compares the speed-up yielded by a pair of mainstream GPUs versus a CPU-only. Not surprisingly, depending on the workload you throw at your GPU, you'll see results ranging from average to downright impressive."
With NVIDIA slowly pushing it's way into the CPU market (CUDA is the first step, in a few years I wouldn't be surprised if Nvidia started developing processors) and Intel trying to cut into NVidia's GPU market share with Larrabee http://en.wikipedia.org/wiki/Larrabee_(GPU), we'll see who can develop outside of their box faster. This is good news for AMD since Intel will be more focused on Nvidia instead of being neck to neck with them in the processor market. Hey, maybe AMD will regain it's power in the server and netbook realms.
There's also going to be a battle of patents pretty soon too. Wish I was a tech lawyer.
"The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
I'd welcome the opportunity to prove otherwise. I've been managing editor for the last year, and much has changed. Best, Chris
The Tesla 1060 is a video card with no video output (strictly for processing) that has something like 240 processor cores and 4 GB of DDR3 RAM. Just doing math on large arrays (1k x 1k) I get a performance boost of about a factor of forty over a dual core 3.0 GHz Xeon.
The CUDA extension set has FFT functionality built in as well, so it's excellent for signal processing. The SDK and programming paradigm is super easy to learn. I only know C (and not C++) and I can't even make a proper GUI, but I can make my array functions run massively in parallel.
The trick is to minimize memory moving between the CPU and the GPU because that kills performance. Only the brand newest cards support functionality for "simultaneous copy and execute" where one thread can be reading new data to the card, another can be processing, and the third can be moving the results off the card.
One way that the video people can maybe speed up their processing (disclaimer: I don't know anything about this) is to do a quick sweep for keyframes, and then send the video streams between keyframes to individual processor cores. So instead of each core gets a piece of the frame, maybe each core gets a piece of the movie.
The days of the math coprocessor card have returned!
We've run some signal processing on a Tesla card, and get roughly 500x improvement over (somewhat poorly written) code for a Core 2 Duo.
~8 hr on a Core 2 Duo
~1.5 hr on Core i7
seconds on Tesla
In general, it's not tied to a card. CUDA itself might be NVIDIA-dependent, but general-purpose GPU programming is not, and other manufacturers will have similar interfaces to GP-GPU programming, eventually.
As for my own experience with it... everyone at work is going crazy over them. One of our major simulations implements a high-fidelity IR scene modeler. It used to take 2 seconds per frame on CPU-only. They re-wrote it with GPU and got it down to 12 ms.
Anything that is highly parallelizable with low memory transfer reqts will get a pretty impressive speedup. My co-worker who has been doing this for a year now was explaining that computation is essentially free, it's the memory operations which are the bottleneck.
In fact, these GPUs are yet another example of how there is nothing new under the sun. A GPU is very much like the vector processor of Cray-style supercomputing (when Cray was still alive that is) aka SIMD (single instruction, multiple data).
Actually, not quite. The execution architecture in the Nvidia's G80 series GPUs and onwards is actually SIMT, single instruction multiple threads. The not so subtle difference here is that in a SIMD vector architecture the application explicitly manages instruction level divergence which will generally narrow the SIMD width of divergent paths to only 1 path, whereas in a SIMT architecture when threads diverge within a warp all divergent threads executing the same branch within that warp can be issued an instruction simultaneously, with the threads that are not on that branch within that warp inactive for that cycle. This is transparent to the application. Currently in Nvidia's latest architecture the warp size is still statically set at 32 threads so you'll see performance penalties when threads within any warp diverge proportional to the number of unique paths taken. Interestingly the next iteration of the hardware is rumored to feature a thread scheduler capable of variable warp sizes, probably still with some lower bound, but this would bring the GPU much closer to the ideal "array of independently executing processing cores" that we have in modern CPUs, but with obviously far more cores.
yeah, make it like wikipedia articles. they are long but easily navigatable.
Wealth is the gift that keeps on giving.
I assume that's what the parent meant.
As an addendum, the newest CUDA 2.2 (with chip of the newest generation, i.e. GT200) actually has support for reading directly from (page-locked) host memory inside of GPU kernels... something I believe ATI cards have allowed for a while.
Those benchmarks show that even older ($120-140) nVidia GPU cards can really speed up some processing tasks, especially transcoding video. But what I think is even more exciting than just the acceleration from offloading CPU to GPU is using multiple GPU cards in a single host PC. Stuff a $1000 PC with $1120 in GPUs (like 8 $140 nVidia cards), and that's 1024 parallel cores, anywhere from 16x to 56x the performance at only just over double the price.
Your passwords are no longer safe.
It used to require days for a cluster of PCs to brute force an 8+ character password.
Now with a big enough PSU, you can stuff a tower with graphics cards to get it done in hours.
About the only common hash I can't find a CUDA enabled brute forcer for is NTLM2
[Fuck Beta]
o0t!
How is this different than AMD-v, which Intel licenses for their virtualization (or maybe I'm confusing it with a64, which Intel licenses)?
Either way, if AMD "died tomorrow", the same thing would happen as would happen if Nvidia did: some other company, likely a previous competitor, would buy up the technology, and things would continue with barely a hickup.
A product or technology does not need to be open source or 'standards based' to gain wild adoption. Sometimes, a technology speaks for itself. After all, ARM CPUs are literally everywhere, as are many other things which are quite closed (as I'm sure you're aware). There will be someone else waiting in the wings to pick up the chalice, should it be dropped, with all worthwhile technology.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers