NVIDIA Shaking Up the Parallel Programming World

← Back to Stories (view on slashdot.org)

NVIDIA Shaking Up the Parallel Programming World

Posted by ScuttleMonkey on Friday May 2, 2008 @09:37PM from the best-discoveries-made-by-accident dept.

An anonymous reader writes "NVIDIA's CUDA system, originally developed for their graphics cores, is finding migratory uses into other massively parallel computing applications. As a result, it might not be a CPU designer that ultimately winds up solving the massively parallel programming challenges, but rather a video card vendor. From the article: 'The concept of writing individual programs which run on multiple cores is called multi-threading. That basically means that more than one part of the program is running at the same time, but on different cores. While this might seem like a trivial thing, there are all kinds of issues which arise. Suppose you are writing a gaming engine and there must be coordination between the location of the characters in the 3D world, coupled to their movements, coupled to the audio. All of that has to be synchronized. What if the developer gives the character movement tasks its own thread, but it can only be rendered at 400 fps. And the developer gives the 3D world drawer its own thread, but it can only be rendered at 60 fps. There's a lot of waiting by the audio and character threads until everything catches up. That's called synchronization.'"

7 of 154 comments (clear)

Min score:

Reason:

Sort:

Where's the story? by pmontra · 2008-05-02 21:51 · Score: 4, Informative

The articles sums up the hurdles of parallel programming and says that NVIDIA's CUDA is doing something to solve them but it doesn't say what. Even the short Wikipedia entry at http://en.wikipedia.org/wiki/CUDA tells more about it.
1. Re:Where's the story? by Yokaze · 2008-05-03 02:46 · Score: 2, Informative
  
  -Why would character movement need to run at a certain rate? It sounds like the thread should spend most of its time blocked waiting for user input.
  
  You usually have a game-physics engine running, which practically integrates the movements of the characters (character movement) or generally updates the world model (position and state of all objects). Even without input, the world moves on. The fixed rate is usually taken, because it is simpler than a varying time-step rate.
  
  -What's so special about the audio thread? Shouldn't it just handle events from other threads without communicating back?
  
  Audio is the most sensible thing to timing issues: Contrary to video (or simulation), you cannot drop arbitrary pieces of sound without the user immediately noticing.
  
  -How do semaphores affect SMP cache efficiency? Is the CPU notified to keep the data in shared cache?
  
  Not specially, they are simply a special case of the problem: How to access data
  Several threads may compete for the same data, but if they are accessing the same data in one cache-line, it will lead to lots of communication (thrashing the cache).
  In CUDA, a thread-manager is aware of the memory layout and will decide, which parts of memory will be processed by which shaders/ALUs/CPUs. Thereby, it is also possible to make more efficient use of the caches.
  
  -What is a "3D world drawer"? Is it where god keeps us in his living room?
  
  Drawer as in "someone, who draws", or 3D world painter. It draws/paints the state of the world as updated by the simulation thread.
  This can happen asynchronously, as you will not notice, if a frame is dropped occasionally.
  
  --
  "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Re:Dumbing down by Lobais · 2008-05-02 22:17 · Score: 2, Informative

Oh, and CUDA btw. http://en.wikipedia.org/wiki/CUDA

CUDA ("Compute Unified Device Architecture"), is a GPGPU technology that allows a programmer to use the C programming language to code algorithms for execution on the graphics processing unit (GPU).
Uh, what a crap by udippel · 2008-05-02 22:36 · Score: 4, Informative

"News for Nerds, Stuff that matters".
But not if posted by The Ignorant.

What if the developer gives the character movement tasks its own thread, but it can only be rendered at 400 fps. And the developer gives the 3D world drawer its own thread, but it can only be rendered at 60 fps. There's a lot of waiting by the audio and character threads until everything catches up. That's called synchronization.

If a student of mine wrote this, a Fail will be the immediate consequence. How can 400 fps be 'only'? And why is threading bad, if the character movement is ready after 1/400 second? There is not 'a lot of waiting'; instead, there are a lot of cycles to calculate something else. and 'waiting' is not 'synchronisation'.
[The audio-rate of 7000 fps gave the author away; and I stopped reading. Audio does not come in fps.]

While we all agree on the problem of synchronisation in parallel programming, and maybe especially in the gaming world, we should not allow uninformed blurb on Slashdot.
CUDA is limiting, not liberating by njord · 2008-05-03 00:53 · Score: 4, Informative

From my experience, CUDA was much harder to take advantage of then multi-core programming. CUDA requires you to use a specific model of programming that can make it difficult to take advantage of the full hardware. The restricted caching scheme makes memory management a pain, and the global synchronization mechanism is very crude - there's a barrier after each kernel execution, and that's it. It took me a week to 'parallelize' port some simple code I had written to CUDA, whereas it took my an hour or so to add the OpenMP statements to my 'reference' CPU code. Sorry Nvidia - there is no silver bullet. By making some parts of parallel programming easy, you make others hard or impossible.
Re:I don't understand the point of this article. by TheRaven64 · 2008-05-03 01:32 · Score: 2, Informative

But you can't have a 12GHz, at that speed light goes about ONE INCH per clock cycle in a vacuum, anything else is slower, signals in silicon are a lot slower.
An inch is a long way on a CPU. A Core 2 die is around 11mm along the edge, so at 12GHz a signal could go all of the way from one edge to the other and back. It uses a 14-stage pipeline, so every clock cycle a signal needs to travel around 1/14th of the way across the die, giving around 1mm. If every signal needs to move 1mm per cycle and travels at the speed of light, then your maximum clock speed is 300GHz.
Of course, as you say, electric signals travel a fair bit slower in silicon than photons do in a vacuum, and you often have to go a quite indirect route due to the fact that wires can't cross on a CPU, so the practical speed might be somewhat lower.
Intel discovered that nearly half their execution units were waiting most of the time so they invented HyperThreading. Minor nitpick, but actually IBM were the first to market with SMT, and they took it from a university research project. Intel didn't discover anything other than that their competitors were getting more instructions per transistor than them.

--
I am TheRaven on Soylent News
The EETimes article is much better by Jeremy+Erwin · 2008-05-03 03:36 · Score: 3, Informative

Nvidia unleashes Cuda attack on parallel-compute challenge