Ask Slashdot: What Is the Most Painless Intro To GPU Programming?
dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"
GPU programming is painful. A painless introduction doesn't capture the flavor of it.
Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.
Learn OpenCL and do the job properly.
When all you have is a hammer, every problem starts to look like a thumb.
So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.
Udacity currently has a fairly decent intro course on GPU programming at: https://www.udacity.com/course/cs344
It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.
I'd suggest running through that and then deciding on what API you want to end up using.
I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.
There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.
I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.
BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.