Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

← Back to Stories (view on slashdot.org)

Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

Posted by Soulskill on Friday July 19, 2013 @08:31AM from the large-reference-books-and-opiates dept.

dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

38 of 198 comments (clear)

Min score:

Reason:

Sort:

GPU programming is pain by Anonymous Coward · 2013-07-19 08:35 · Score: 5, Funny

GPU programming is painful. A painless introduction doesn't capture the flavor of it.
1. Re:GPU programming is pain by PolygamousRanchKid+ · 2013-07-19 09:07 · Score: 5, Funny
  
  Yeah, it would be like S&M without the pain . . . cute, but something essential is missing from the experience.
  Heidi Klum has a TV show call "Germany's Next Top Model". She basically gets all "Ilsa, She-Wolf of the SS" on a bunch of neurotic, anorexic, pubescent girls, teaching them how a top model needs to suffer.
  Heidi Klum would make a good GPU programming instructor.
  . . . and even non-geeks would watch the show. A win-win for everyone.
  
  --
  Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
2. Re:GPU programming is pain by Anonymous Coward · 2013-07-19 09:45 · Score: 4, Funny
  
  Yeah, that's what we need! More neurotic, anorexic, pubescent girls who know how to do GPU programming!
3. Re:GPU programming is pain by Darinbob · 2013-07-19 12:15 · Score: 4, Funny
  
  I thought we needed more "Ilsa, She-Wolf" programming instructors.
Learn OpenCL by Tough+Love · 2013-07-19 08:37 · Score: 5, Insightful

Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.
Learn OpenCL and do the job properly.

--
When all you have is a hammer, every problem starts to look like a thumb.
1. Re:Learn OpenCL by Tr3vin · 2013-07-19 08:41 · Score: 4, Interesting
  
  Learn OpenCL and do the job properly.
  This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.
2. Re:Learn OpenCL by Required+Snark · 2013-07-19 08:48 · Score: 2
  
  Yep. Some things are intrinsically hard. GPU programming is SIMD programming, so you have to work with data parallelism. It helps a lot if you understand how the hardware works. This is where assembly language experience can be a big plus.
  There's no substitute for detailed knowledge. Outside of instruction level parallelism, there is no "magic bullet" for parallel programming. Your have to learn things.
  
  --
  Why is Snark Required?
3. Re:Learn OpenCL by Anonymous Coward · 2013-07-19 09:02 · Score: 2, Informative
  
  If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.
  Ordinarily I'd agree with you (programmer's time is worth more than anyone else's) but that means stopping now not even bothering with the GPU, since he already has code that works on the CPU. He's done. The project is complete. Next work order.
  As soon as we start saying he's not already done, we've violated the principle and should stop trying to use it. His target is clearly end-user-enjoyed performance, and he's willing to put in more programmer time. So it's time to hang up the rapid prototype hat, and seriously get his hands dirty.
4. Re:Learn OpenCL by CadentOrange · 2013-07-19 09:03 · Score: 4, Informative
  
  What's wrong with a higher level language that interfaces with OpenCL? You're still writing OpenCL, you're just using Python for loading/storing datasets and initialisation. If you're starting out, something like PyOpenCL might be better as it'll allow you to focus on writing stuff in OpenCL.
5. Re:Learn OpenCL by HaZardman27 · 2013-07-19 09:13 · Score: 4, Insightful
  
  That's because the closest analogy to a software engineer using a more abstracted language in the hardware world is the packaging of common circuitry. Or when hardware engineers design chips, do they actually model out the components of every single transistor?
  
  --
  Apparently wizard is not a legitimate career path, so I chose programmer instead.
6. Re:Learn OpenCL by Midnight+Thunder · 2013-07-19 09:41 · Score: 2
  
  Learn OpenCL and do the job properly.
  This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.
  Well, the first thing is to understand parallel programming and what sort of things work well in a GPU. With that basic understanding, then OpenCL becomes a tool for doing that work. Starting with an OpenCL based "hello world" type application would then be the next step.
  
  --
  Jumpstart the tartan drive.
7. Re:Learn OpenCL by AdamHaun · 2013-07-19 10:23 · Score: 4, Informative
  
  Or when hardware engineers design chips, do they actually model out the components of every single transistor?
  Chip design is absurdly complicated (even on the digital side), and involves several layers of abstraction. In roughly increasing level of detail:
  * Spec level: high-level behavioral description of the functionality of a digital system, something like "8-bit 115.2kbps UART" or "2MHz PWM with 0-100% duty cycle in 0.1% increments".
  * HDL/RTL level: software-like description of the complete system design. Can range from higher-level (describing behavior) to lower-level (describing specific logic). When people talk about buying, selling, or creating "IP" in the chip design world, they're usually talking about RTL for a single functional unit.
  * Gate level: Logic gates and flip-flops and their connections.
  * Transistor level: The transistors that make up the gates, and their connections.
  * Device level: The behavior of an individual transistor.
  * Physical layout: Just what it sounds like; the actual arrangements of metal and silicon.
  There are some more in between, but that should give you an idea. HDLs are not necessarily low-level. For large designs (like modern SoCs), it takes some *very* expensive and complex software to go deeper into the list, and the process is not entirely automated. So I wouldn't say hardware design can't be high-level. The difference is that in hardware, you always have to care about the lowest level when you're doing your high-level design, while in software you can take more things for granted. So even though a board-level design might just be a bunch of off-the-shelf chips hooked together, it still takes a lot of work to make sure everything comes out right.
  
  --
  Visit the
8. Re:Learn OpenCL by Darinbob · 2013-07-19 12:19 · Score: 2
  
  In software when you take the low level for granted you end up with a typical bloated Windows application. Of course people get away with it because you just mock people who don't have enough RAM or CPU power until they upgrade in shame.
9. Re:Learn OpenCL by Anonymous Coward · 2013-07-19 14:13 · Score: 2, Informative
  
  The thing that is hard about gpu programming isn't getting code that works, its getting code that is fast. One of the most significant issues is how the data is arranged and accessed on the GPU. A big portion of this is going to be related to how the data is setup/transfered/accessed over PCIe from/to main memory.
  Basically, your going to want to access that data in a manner that is fairly low level on the cpu side as well. So, the advantages of phython/etc are nullified when you have some binary blob like format your trying to access as a big pinned block of memory. This is the kind of programming that C/C++ specialize in and are really good at. Hence, notice how openCL and CUDA both are very similar to C.
  I'm not saying python isn't going to work, what I am saying is that much like C++ doesn't make a good batch/text manipulation language, python doesn't make a good bit banging language
10. Re:Learn OpenCL by Chaos+Incarnate · 2013-07-19 14:23 · Score: 3, Funny
  
  Just because we C# programmers can't do memory management worth a damn doesn't mean we're no better than VB programmers. We at least know what case sensitivity means. ;)
  
  --
  Benford's Corollary to Clarke's Law: "Any technology distinguishable from magic is insufficiently advanced."
11. Re: Learn OpenCL by guruevi · 2013-07-19 16:00 · Score: 2, Informative
  
  I have written code for computational biology - CUDA is a lot easier to pick up if you're just converting from C. They have great examples and documentation, great plugins but you're stuck on a single hardware platform. OpenCL on the other hand is a lot less 'nice' to begin with (pouring over 250 page PDFs with minimal explanation) but allows you to leverage both CPU and GPU efficiently and a lot less hardware independent although these days it's just nVidia for serious GPU computing and maybe Intel is starting to get into the game (don't know, haven't come across their hardware yet), AMD is a joke, not even all their GPUs (or drivers) have support for GPGPU yet and their drivers just suck.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
Re:XNA or Unity by Tr3vin · 2013-07-19 08:38 · Score: 2

Those are game engines. They will do nothing to help him use the GPGPU capabilities of his graphics card.
Re:XNA or Unity by stewsters · 2013-07-19 08:40 · Score: 4, Informative

I don't think he is looking at making a game, I think he is looking for some cheap parallel processing. I have done some cuda, it was a pain to set up a few years back. There probably are better tutorials now.
Obsidian by jbolden · 2013-07-19 08:51 · Score: 4, Informative

I get the impression that CUDA/OpenCL is still the best option. This thesis on Obsidian presents, a Haskell set of binding which might be easier and also covers the basics quite well. Haskell lends itself really well because the language inherently is designed for parallelism because of purity and out of order computation. That being said, I think Obsidian is a bit rough around the edges but if you are looking for a real alternative, this is one.
1. Re:Obsidian by jbolden · 2013-07-19 13:55 · Score: 4, Informative
  
  The big issue is that Haskell is lazy. Which means in particular the programmer by default doesn't determine order of execution. This makes Haskell a better counter example since order of execution is so key to so many languages.
  Erlang's type system is rather typical dynamic while Haskell has a Hindley–Milner type system which again shows off the plusses of functional better.
  Haskell has more of the most sophisticated ideas in computer science than any other language. It has become the standard for computer science in particular language and compiler research. So when an idea is "news" there is very likely an implementation of Haskell of that idea. Erlang's community is more practical and less cutting edge.
  Haskell is easier to program in.
GPU programming *is* pain, princess. by Chris+Mattern · 2013-07-19 08:52 · Score: 4, Informative

Anyone who tells you differently is selling you something.
Udacity teaches CUDA by Arakageeta · 2013-07-19 08:53 · Score: 2

Check out the Udacity class on parallel programming. It's mostly CUDA (I believe it's taught by NVIDIA engineers): https://www.udacity.com/course/cs344
CUDA is generally easier to program than OpenCL. Of course, CUDA only runs on NVIDIA GPUs though.
OpenACC by SoftwareArtist · 2013-07-19 08:53 · Score: 4, Interesting

OpenACC is what you're looking for. It uses a directive based programming model similar to OpenMP, so you write ordinary looking code, then annotate it in ways that tell the compiler how to transform it into GPU code.
You won't get as good performance as well written CUDA or OpenCL code, but it's much easier to learn. And once you get comfortable with it, you may find it easier to make the step from there into lower level programming.

--
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
Very Similar Story by Chaseshaw · 2013-07-19 08:54 · Score: 2

VB.NET background. Wanted to get into GPGPU to accelerate some of my more complicated math calculations. Tried CLOO (open source .net GPU wrappers) and couldn't get it to work, tried AMD's OPENCL dev gui, couldn't get it to work. Eventually found the answer in python. GPGPU in pyopencl is well-documented thanks to the bitcoiners, and from .net you can either run the python in a shell, or write a little python kernel to listen for, and process commands. Only catch is the opencl abilities are limited, and you have to start dabbling in c++ to get it to do any real work (and even then it's a dumbed-down c++ and many existing extensions don't install or work quite right). All in all I found the entire thing very rewarding though. :) Best of luck.
Coursera by elashish14 · 2013-07-19 09:05 · Score: 2

Coursera has some courses on GPU programming, like this one, and what's nice about them pretty slow, and I'm assuming that they explain things well. Other online courses probably offer the same, and I think the video lectures would be helpful in understanding the concepts.

--
I have left slashdot and am now on Soylent News. FUCK YOU DICE.
1. Re:Coursera by jasax · 2013-07-19 12:30 · Score: 2
  
  I took that course: https://www.coursera.org/course/hetero
  
  I also took a course from Udacity: https://www.udacity.com/course/cs344 but this one I didn't finish, I've done perhaps 30% of it (I already had finished Coursera's). One of these days I'll go there to close matters :-)
  
  The courses in Udacity are "always online", so anyone can register anytime and finish the course with his/hers own pace. Quizzes, exams and grading with certificate included have no fixed limits. On the other hand, the courses from Coursera have deadlines and run more or less in parallel with "snail" university schedules, with start and stop dates, with time limits in quizzes and exams, etc. (You can usually see videos, and do quizzes anytime after they end, but no certificates and grading AFAIK).
  
  Both courses were good -- I recommend both, -- we did homeworks in Amazon's cloud transparently, and certainly both were "sponsored" by Nvidia, coz we learned only CUDA. (Perhaps there was a brief blah blah about competing alternatives.)
  
  But from what I've seen, if someone is afraid from CUDA, then its better to run away very fast from alternatives (OpenCL) :-)
Nothing easy but Udacity can help by Jthon · 2013-07-19 09:12 · Score: 5, Informative

So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.
Udacity currently has a fairly decent intro course on GPU programming at: https://www.udacity.com/course/cs344
It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.
I'd suggest running through that and then deciding on what API you want to end up using.
Re:CUDA by Anonymous Coward · 2013-07-19 09:21 · Score: 2, Insightful

Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.
Understand The Hardware by Anonymous Coward · 2013-07-19 09:28 · Score: 3, Informative

If you are going to program a GPU, and you are looking for performance gains, you MUST understand the hardware. In particular, you must understand the complicated memory architecture, you must understand the mechanisms for moving data from one memory system to another, and you must understand how your application and algorithm can be transformed into that model.
There is no shortcut. There is no magic. There is only hardware.
If you do not believe me, you can hunt up the various Nvidia papers walking you through (in painful detail-- link below) the process of writing a simple matrix transpose operation for CUDA. The difference between a naive and a good implementation, as shown in that paper, is huge.
That said, once you understand the principles, CUDA is relatively easy to learn as an extension of C, and the Nvidia profiler, NVVP, is good at identifying some of the pitfalls for you so that you can fix them.
http://www.cs.colostate.edu/~cs675/MatrixTranspose.pdf
Re:CUDA by UnknownSoldier · 2013-07-19 09:32 · Score: 4, Informative

Agreed 100% about CUDA and OpenMP! Already invented a new multi-core string searching algorithm and having a load of fun playing around with my GTX Titan combing CUDA + OpenMP. You can even do printf() from the GPU. :-)
The most _painless_ way to learn CUDA is to install CUDA on a Linux (Ubuntu) box or Windows box.
https://developer.nvidia.com/cuda-downloads
On Linux, at the command line fire up 'nsight' open the CUDA SDK samples and start exploring! And by exploring I mean single-stepping through the code. The NSight IDE is pretty darn good considering it is free.
Another really good doc is the CUDA C Programming Guide.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Oh and don't pay attention to the Intel Propaganda - there are numerous inaccuracies:
Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
http://pcl.intel-research.net/publications/isca319-lee.pdf
OpenACC or OpemMP 4.0 are exactly what you want by John_The_Geek · 2013-07-19 09:33 · Score: 5, Informative

I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.
There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.
I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.
BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.
Do you need the GPU? by jones_supa · 2013-07-19 09:39 · Score: 2

You would probably see a multi-fold increase in performance by simply converting your project from C# to C++.
Re:XNA or Unity by Anonymous Coward · 2013-07-19 09:57 · Score: 2, Informative

Incorrect. That is certainly a valid approach and the GP should be modded up.
Using textures and shaders you can very easily do massively parallel floating point operations in XNA on the GPU, and it's a language the asker is familiar with.
Think outside the box a little bit.
Re:OpenACC by 140Mandak262Jamuna · 2013-07-19 10:01 · Score: 2

It works in theory. In practice, unless you understand your code well, and the way compiler built the instructions well, and understood what these directives very well, you wont get any speed improvements. There are times when the over heads slow down the code and the simple minded implementation had brain dead locks, and you end up with slower code.
We have come a long way since the days of assembly and assembly in another name Fortran. But the overheads of the higher level languages have been masked a lot by the ever increasing speed and memory availability. Whole generations of programmers have come up, higher level languages with IDE and CASE tools from day one they fundamentally don't understand how the code actually works. They are continually stumped by the fact the code does what they tell it to do, not what they meant it to do.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Re:OpenACC by SoftwareArtist · 2013-07-19 10:18 · Score: 2

True, and this is even more true on GPUs than CPUs. They do a lot less to shield you from the low level details of how your code gets executed, so those details end up having a bigger impact on your performance. And to make it worse, those details change with every new hardware generation!
But for a new user just getting into GPU programming, it's easier to learn those things in the context of a simple programming model like OpenACC than a complicated one like CUDA or OpenCL. That just forces them to deal with even more complexity and hardware details right from the very start. OpenACC can produce good results if used well. And once you've learned to do that, you're in a better position to tackle the harder technologies.

--
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
Re:XNA or Unity by Tr3vin · 2013-07-19 13:25 · Score: 2

How many boxes do you want to go through before you get to the solution? Sure, he could write it as a shader, but that hardly requires pulling in something like Unity or XNA to build the project.
Harnessing GPU vs Learning GPU by Anonymous Coward · 2013-07-19 16:26 · Score: 2, Interesting

Writing GPU programs is hard. Not only do you have to learn a new sets of APIs, you also have to understand the underlying architecture to extract decent performance. It requires a different approach to problem solving that requires months if not years to develop.
Fortunately you don't need to read the entire cuda programming guide to program on the GPU. There are several excellent libraries out there which hide the complexities of the GPU architecture. Since you are doing image processing, I would recommend Arrayfire (http://www.accelereyes.com/products/arrayfire). It is a free library which provides several image processing functions which have been optimized for the GPU. You should also look into Thrust and NPP(included with the CUDA toolkit), although these libraries are more verbose and require greater understanding of the C++ and GPU to program.
OpenCL not obsolete. OpenACC generates CUDA/OpenCL by keneng · 2013-07-19 16:55 · Score: 2

OpenACC may be higher-level(easier to use), but it still generates CUDA/OpenCL code. Your wording sounded like "OpenCL support is gone." I want to correct you on that. OpenCL is the future and wraps CUDA also. If you code for CUDA, you can only target CUDA hardware. If you code for OpenCL, you can target not only AMD, but also CUDA hardware. That was the point of the OpenCL spec in the first place. OpenCL can also transparently take advantage of the local CPU cores. OPENCL has one drawback. OpenCL does not support all types. It is highly constrained to certain kinds of types relevant to graphics/3D. There have been some kludge patches to make CUDA/OpenCL work with string types (i.e. parallel grep with CUDA), but these aren't well suited because the hardware was not intended for that and it requires a lot of moving of memory from the main motherboard memory to the graphics card memory which wastes a lot of time. String parallelizing is better done with mechanisms like OpenMP. OpenMP can support any kind of types and crunch with them and OpenMP is designed to co-exist with MPI(RPC-like many computer parallelism).
Start learning OpenCL, OpenMP, MPI, GNU & boost library parallelism. To make it easier try running golang opencl examples:
apt-get install mercurial meld
hg clone -u release https://code.google.com/p/go
cd go
cd src ./all.bash
#Put this into your ~/.bashrc:
export GOROOT=/home/youruser/yourgo
export PATH=$PATH:$GOROOT/bin
mkdir -p ~/goopencl
cd ~/goopencl
mkdir -p ~/goopencl/pkg
mkdir -p ~/goopencl/src
export GOPATH=/home/youruser/goopencl
go get github.com/tones111/go-opencl/cl
go get github.com/tones111/raw
cd /home/youruser/goopencl/src/github.com/tones111/go-opencl/cl/demo/rotate/
go run rotate.go -i="i.png" -o="o.png" -a=15