NVIDIA Shaking Up the Parallel Programming World
An anonymous reader writes "NVIDIA's CUDA system, originally developed for their graphics cores, is finding migratory uses into other massively parallel computing applications. As a result, it might not be a CPU designer that ultimately winds up solving the massively parallel programming challenges, but rather a video card vendor. From the article: 'The concept of writing individual programs which run on multiple cores is called multi-threading. That basically means that more than one part of the program is running at the same time, but on different cores. While this might seem like a trivial thing, there are all kinds of issues which arise. Suppose you are writing a gaming engine and there must be coordination between the location of the characters in the 3D world, coupled to their movements, coupled to the audio. All of that has to be synchronized. What if the developer gives the character movement tasks its own thread, but it can only be rendered at 400 fps. And the developer gives the 3D world drawer its own thread, but it can only be rendered at 60 fps. There's a lot of waiting by the audio and character threads until everything catches up. That's called synchronization.'"
where's the MIT CS guys when you need them?
Wow, I bet nobody on slashdot knew that!
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
The articles sums up the hurdles of parallel programming and says that NVIDIA's CUDA is doing something to solve them but it doesn't say what. Even the short Wikipedia entry at http://en.wikipedia.org/wiki/CUDA tells more about it.
Libertarian Leaning Political Discussion Forum.
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Many moons ago, when most slashdotters were nippers, a British company named INMOS provided an extensible hardware and software platform that solved the problem of parallelism, in many ways similar to CUDA.
Ironically, some of the first demos I saw using transputers was raytracing demos.
The problem of parallelism and the solutions available are quite old (more than 20 years), but it's only now that limits are reached that we see the true need for it. But the true pioneers is not NVIDIA, because there were others long before them.
When I came up through my CS degree, object-oriented programming was new. Programming was largely a series of sequentially ordered instructions. I haven't programmed in many years now, but if I wanted to write a parallel program I would not have a clue.
But why should I?
What is needed are new, high-level programming languages that figure out how to take a set of instructions and best interface with the available processing hardware on their own. This is where the computer smarts need to be focused today, IMO.
All computer programming languages, and even just plain applications, are abstractions from the computer hardware. What is needed are more robust abstractions to make programming for multiple processors (or cores) easier and more intuitive.
A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
"News for Nerds, Stuff that matters".
But not if posted by The Ignorant.
What if the developer gives the character movement tasks its own thread, but it can only be rendered at 400 fps. And the developer gives the 3D world drawer its own thread, but it can only be rendered at 60 fps. There's a lot of waiting by the audio and character threads until everything catches up. That's called synchronization.
If a student of mine wrote this, a Fail will be the immediate consequence. How can 400 fps be 'only'? And why is threading bad, if the character movement is ready after 1/400 second? There is not 'a lot of waiting'; instead, there are a lot of cycles to calculate something else. and 'waiting' is not 'synchronisation'.
[The audio-rate of 7000 fps gave the author away; and I stopped reading. Audio does not come in fps.]
While we all agree on the problem of synchronisation in parallel programming, and maybe especially in the gaming world, we should not allow uninformed blurb on Slashdot.
Back in the early 80's I was working in Bristol UK for TDI (who were the UCSD p-system licensees) porting it to various machines... Well, we had one customer who wanted a VAX p-system so we trotted off to INMOS's office and sat around in the computer room. (VAX 11/780 I think). At the time they were running Transputer simulations on the machine so the VAX p-system took er... about 30 *minutes* to start. Just for comparison an Apple ][ running IV.x would take less than a minute. Almost an hour to make a tape. (About 15 users running emulation I think). Fond memories of the transputer. Almost bought a kit to play with it... Andy
CUDA is an interesting way to utilize NVIDIA's graphics hardware for tasks it wasn't really designed for, but it's not a solution to parallel computing in and of itself. (more on that momentarily) A few people have gotten their nice high end Quadros to do some pretty cool stuff, but to date it's been limited primarily to relatively minor academic purposes. I don't see CUDA becoming big in gaming circles anytime soon. Let's face it, most gamers buy *one* reasonably good video card and leave it at that. Your video card has better things to do than handle audio or physics when your multi-core CPU is probably being criminally underutilized. Nvidia, of course, wants people to buy wimpy CPU's and then load up on massive SLI rigs and then do all their multi-purpose computation in CUDA. Not gonna happen.
First of all, there are very few general purpose applications that special purpose NVIDIA hardware running CUDA can do significantly better than a real general purpose CPU, and Intel intends to cut even that small gap down within a few product cycles. Second, nobody wants to tie themselves to CUDA when it's built entirely for proprietary hardware. Third, CUDA still has a *lot* of limitations. It's not as easy to develop a physics engine for a GPU using CUDA as it is for a general purpose CPU.
Now, I haven't used CUDA lately, so I could be way off base here. However, multi-threading isn't the real challenge to efficient use of resources in a parallel computing environment. It's designing your algorithms to be able to run in parallel in the first place. Most multi-threaded software out there still has threads that have to run on a single CPU, and the entire package bottlenecks on the single CPU running that thread even if other threads are free to run on other processors. This sort of bottleneck can only be avoided at the algorithm level. This isn't something CUDA is going to fix.
Now, I can certainly see why NVIDIA is playing up CUDA for all they're worth. Video game graphics rendering could be on the cusp of a technological singularity. Namely, ray tracing. Ray tracing is becoming feasible to do in real time. It's a stretch at present, but time will change that. Ray tracing is a significant step forward in terms of visual quality, but it also makes coding a lot of other things relatively easy. Valve's recent "Portal" required some rather convoluted hacks to render the portals with acceptable performance, but in a ray tracing engine those same portals only take a couple lines of code to implement and have no impact on performance. Another advantage of ray tracing is that it's dead simple to parallelize. While current approaches to video game graphics are going to get more and more difficult to work with as parallel processing rises, ray tracing will remain simple.
The real question is whether NVIDIA is poised to do ray-tracing better than Intel in the next few product cycles. Intel is hip to all of the above, and they can smell blood in the water. If they can beef up the floating point performance of their processors then dedicated graphics cards may soon become completely unnecessary. NVIDIA is under the axe and they know it, which might explain all the recent anti-Intel smack-talk. Still, it remains to be seen who can actually walk the walk.
The original Inmos Transputer was designed to solve such problems and relied on fast inter-processor links, and the AMD Hypertransport bus is a modern derivative.
So I disagree with you. The processing hardware is not so much the problem. If GPUs are small, cheap and address lots of memory, so long as they have the necessary instruction sets they will do the job. The issue to focus on is still interprocessor (and hence interprocess) links. This is how hardware affects parallelism.
I have on and off worked with multiprocessor systems since the early 80s, and always it has been fastest and most effective to rely on data channels rather than horrible kludges like shared memory with mutex locks. The code can be made clean and can be tested in a wide range of environments. I am probably too near retirement now to work seriously with Erlang, but it looks like a sound platform.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
From my experience, CUDA was much harder to take advantage of then multi-core programming. CUDA requires you to use a specific model of programming that can make it difficult to take advantage of the full hardware. The restricted caching scheme makes memory management a pain, and the global synchronization mechanism is very crude - there's a barrier after each kernel execution, and that's it. It took me a week to 'parallelize' port some simple code I had written to CUDA, whereas it took my an hour or so to add the OpenMP statements to my 'reference' CPU code. Sorry Nvidia - there is no silver bullet. By making some parts of parallel programming easy, you make others hard or impossible.
NVidia has every reason to want OpenGL to succeed - if it doesn't, Microsoft will rule supreme over the API to NVidia's hardware, and that isn't a healthy situation to be in. As it is, OpenGL gives them some freedom to do their own thing.
However, having mentioned Microsoft... If *someone* does want OpenGL to succeed it is them... If and when OpenGL 3.0 ever appears, I bet there will be some talk of some "unknown party" threatening patent litigation...
Destroying OpenGL is of paramount important to Microsoft, since it will grant them total dominance over 3D graphics. Apple, Linux, Sony (PS3), and other vendors that rely on OpenGL will completely lose their ability to compete.
An inch is a long way on a CPU. A Core 2 die is around 11mm along the edge, so at 12GHz a signal could go all of the way from one edge to the other and back. It uses a 14-stage pipeline, so every clock cycle a signal needs to travel around 1/14th of the way across the die, giving around 1mm. If every signal needs to move 1mm per cycle and travels at the speed of light, then your maximum clock speed is 300GHz.
Of course, as you say, electric signals travel a fair bit slower in silicon than photons do in a vacuum, and you often have to go a quite indirect route due to the fact that wires can't cross on a CPU, so the practical speed might be somewhat lower.
Intel discovered that nearly half their execution units were waiting most of the time so they invented HyperThreading. Minor nitpick, but actually IBM were the first to market with SMT, and they took it from a university research project. Intel didn't discover anything other than that their competitors were getting more instructions per transistor than them.I am TheRaven on Soylent News
That in a nutshell is why I suggested that investment in Erlang would be a good idea. It's better to start with the right approach and optimise it, than go off into computer science blue sky and try to design a perfect language for paralleling GPUs - which practically nobody will ever really use.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
Nvidia unleashes Cuda attack on parallel-compute challenge
Avoid the blog spam. This is the actual article in EE times: Nvidia unleashes Cuda attack on parallel-compute challenge.
Nvidia is showing signs of being poorly managed. CUDA is a registered trademark of another hi-tech company.
The underlying issue is apparently that Nvidia will lose most of its mid-level business when AMD/ATI and Intel/Larrabee being shipping integrated graphics. Until now, Intel integrated graphics has been so limited as to be useless in many mid-level applications. Nvidia hopes to replace some of that loss with sales to people who want to use their GPUs to do parallel processing.
I live in Minnesota, home of the legendary Cray Research. I've met with several old timers that developed the technologies that made the Cray Supercomputer what it was. Hearing about the problems that multi-core developers are facing today reminds me of the stories I heard about how the engineers would have to build massive cable runs from processor board to processor board to memory board just to synchronize the clocks and operations so that when the memory was ready to read or write data, it could tell the processor board... half a room away.
As I recall:
The processor, as it was sending the data to the bus, would have to tell the memory to get ready to read data through these cables. The "cables hack" was necessary because the cable path was shorter than the data bus path, and the memory would get the signal just a few mS before the data arrived at the bus.
These were fun stories to hear but now seeing what development challenges we face in parallel programming multi-core processors gives me a whole new appreciation for those old timers. These are old problems that have been dealt with before, just not on this scale. I guess it is true what they say, history always repeats itself.
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.