Slashdot Mirror


Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

198 comments

  1. Check out MC# by Anonymous Coward · · Score: 1

    I tried it out once a while ago just to see what it does. It looks 'dead' from a support POV, but it is still out there;

    Release notes for MC# 3.0:
    a) GPU support both for Windows and Linux,
    b) integration with Microsoft Visual Studio 2010,
    c) bunch of sample programs for running on GPU (including multi-GPU versions),
    d) "GPU programming with MC#" tutorial.

  2. GPU programming is pain by Anonymous Coward · · Score: 5, Funny

    GPU programming is painful. A painless introduction doesn't capture the flavor of it.

    1. Re:GPU programming is pain by PolygamousRanchKid+ · · Score: 5, Funny

      Yeah, it would be like S&M without the pain . . . cute, but something essential is missing from the experience.

      Heidi Klum has a TV show call "Germany's Next Top Model". She basically gets all "Ilsa, She-Wolf of the SS" on a bunch of neurotic, anorexic, pubescent girls, teaching them how a top model needs to suffer.

      Heidi Klum would make a good GPU programming instructor.

      . . . and even non-geeks would watch the show. A win-win for everyone.

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
    2. Re:GPU programming is pain by Anonymous Coward · · Score: 1

      Only if the language you're using is pain. In other words: If you're trying to use C/C++/C#/Java/Pascal/⦠to write highly parallel code... YOU'RE DOING IT WRONG.

      Those languages are not made for that. Don't try to shoehorn parallel programming onto them.

      This is a far more elegant task in functional languages like Haskell, which are from ground up designed for parallel processing.

      Then again, many programmers still sit in the tiny mental box of C & co, and think it's "the shit".
      Yeah, for low-level code like drivers and memory managers, etc. But stop seeing nails everywhere just because you cling to the hammer as your only tool.

    3. Re:GPU programming is pain by Anonymous Coward · · Score: 4, Funny

      Yeah, that's what we need! More neurotic, anorexic, pubescent girls who know how to do GPU programming!

    4. Re:GPU programming is pain by Ken_g6 · · Score: 1

      Only if the language you're using is pain. In other words: If you're trying to use C/C++/C#/Java/Pascal/⦠to write highly parallel code... YOU'RE DOING IT WRONG.

      Those languages are not made for that. Don't try to shoehorn parallel programming onto them.

      This is a far more elegant task in functional languages like Haskell, which are from ground up designed for parallel processing.

      But GPU programming isn't just about parallel programming. It's also about low register availability, high memory latency, complicated memory access patterns, and just-plain-strange inter-process communication. The GPU has many more parts than a CPU, and you need to learn to use most or all of them effectively.

      --
      (T>t && O(n)--) == sqrt(666)
    5. Re:GPU programming is pain by Anonymous Coward · · Score: 0

      CPU programming is about all of those things, too, though. Not least if you want to try to get any use out of an AVX unit. Try doing divergent branches or memory accesses across an AVX unit and see how you get on!

    6. Re:GPU programming is pain by Anonymous Coward · · Score: 0

      Nice! I came here to say the same thing - It's hard to find a painful way to introduce yourself to pain :)

      To be fair, the parallel programming part is useful to learn if you haven't done it much, otherwise there are a lot of details that aren't so interesting.

    7. Re:GPU programming is pain by Darinbob · · Score: 4, Funny

      I thought we needed more "Ilsa, She-Wolf" programming instructors.

    8. Re:GPU programming is pain by Anonymous Coward · · Score: 0

      If there isn't already a library for that (there is), then you write that stuff inside one. It's bad form to pollute your algorithms with bookkeeping boilerplate stuff.

      And don't confuse C-like language's libraries with Haskell libraries. In Haskell, "reprogramming the semicolon", to redefine the actual flow of the code or separate the bookkeeping aspect from the main code, is standard practice and a typical test to separate the raw recruits from the programmers. And that's only the beginning.

      Also, if you think any form of multi-threaded IPC is "plain strange", you are *definitely* a newbie. Another thing that C-likes make harder than it is.

    9. Re:GPU programming is pain by mc6809e · · Score: 1

      In other words: If you're trying to use C/C++/C#/Java/Pascal/æ to write highly parallel code... YOU'RE DOING IT WRONG.

      You don't use those languages to write highly parallel code. You use those languages to sequentially control a GPU to get it to execute programs in parallel.

      Big difference (really).

    10. Re:GPU programming is pain by Anonymous Coward · · Score: 0

      Listen, I love to program in languages like Clojure and Erlang where parallel programming is a breeze, but honestly they are not the best fit for GPU and related programming (i.e. games). The most important thing in these scenarios is control over memory and flow.

      In Clojure for example, I can spawn tons of threads that once they get going, might perform decently well and prevent me from making parallel programming mistakes, but the amount of allocations caused by using immutable types makes it worthless to me in a game. There's a reason these languages usually don't get used in games beyond just "people don't know them or want to know them."

      That said, in GPU programming you can certainly use them in pre or post phases, just not when it comes to shaders. Shaders have their own language and you must prepare the data for them and upload data to the GPU. The GPU has its own way of doing things and I fail to see how I can benefit much from Haskell when its role is to get the GPU to execute a program, not to actually perform the execution.

    11. Re:GPU programming is pain by davester666 · · Score: 1

      It won't feel so bad if you first spend time working on the open-source GPU drivers.

      --
      Sleep your way to a whiter smile...date a dentist!
    12. Re:GPU programming is pain by tigersha · · Score: 1

      No, we need more instructors who look like Heidi Klum!

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
    13. Re:GPU programming is pain by Anonymous Coward · · Score: 0

      I actually had a pretty attractive japanese CS instructor in college. She actually struck me with a pen for not getting the correct answer.

      I kinda liked it....

  3. Learn OpenCL by Tough+Love · · Score: 5, Insightful

    Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.

    Learn OpenCL and do the job properly.

    --
    When all you have is a hammer, every problem starts to look like a thumb.
    1. Re:Learn OpenCL by Tr3vin · · Score: 4, Interesting

      Learn OpenCL and do the job properly.

      This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.

    2. Re:Learn OpenCL by Anonymous Coward · · Score: 0, Troll

      Surely the whole point of computer programming is efficiency - efficiency over doing a task without a computer.

      If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.

    3. Re:Learn OpenCL by Required+Snark · · Score: 2
      Yep. Some things are intrinsically hard. GPU programming is SIMD programming, so you have to work with data parallelism. It helps a lot if you understand how the hardware works. This is where assembly language experience can be a big plus.

      There's no substitute for detailed knowledge. Outside of instruction level parallelism, there is no "magic bullet" for parallel programming. Your have to learn things.

      --
      Why is Snark Required?
    4. Re:Learn OpenCL by sl4shd0rk · · Score: 0

      don't even think about VBing it. Or Pythoning it.

      Awwwwww yisssssss... mothoafokin Assembly!

      --
      Join the Slashcott! Feb 10 thru Feb 17!
    5. Re:Learn OpenCL by Anonymous Coward · · Score: 1

      Considering that GPU programming is intrinsically parallel in nature and pretty much none of those "easier" means really have the concept in question in their worldview, I call BULLSHIT on your line of reasoning.

    6. Re:Learn OpenCL by Anonymous Coward · · Score: 2, Informative

      If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.

      Ordinarily I'd agree with you (programmer's time is worth more than anyone else's) but that means stopping now not even bothering with the GPU, since he already has code that works on the CPU. He's done. The project is complete. Next work order.

      As soon as we start saying he's not already done, we've violated the principle and should stop trying to use it. His target is clearly end-user-enjoyed performance, and he's willing to put in more programmer time. So it's time to hang up the rapid prototype hat, and seriously get his hands dirty.

    7. Re:Learn OpenCL by CadentOrange · · Score: 4, Informative

      What's wrong with a higher level language that interfaces with OpenCL? You're still writing OpenCL, you're just using Python for loading/storing datasets and initialisation. If you're starting out, something like PyOpenCL might be better as it'll allow you to focus on writing stuff in OpenCL.

    8. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      Rewriting a bunch of boilerplate and high level code to gain maybe 0.01% more speed? We don't know what specific algorithm he needs to use. If it just depends on a bit of simple arithmetic in a tight loop, then that is the only part that needs to be done in something specific to the GPU. And if the algorithm is simple enough, even the a crappy translator or compiler will get it to run quickly on a GPU. Wasting time writing the rest of the program in a language you have no interest or taste for will waste you more time than you recover for a lot of things these days. And for everyone complaining that it is difficult to learn parallel computation and that it is going to be painful no matter what, that is BS for many algorithms. Some are inherently and dead simple parallel, and if that is all you are going to do, there won't be any great difficulty getting a massive speed up from GPGPU. At worst you might have some cache issues that cost you a little, but still end up faster than doing it on the CPU.

      It is not that OpenCL or learning the architecture and basics of parallel programming are bad advice. But the idea that those are the only paths that will go anywhere for all projects is bull.

    9. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      One of the major difference of hardware engineers vs software "engineers" is that hardware people don't mess around with languages/frameworks etc. The only common languages are VHDL/verilog and may be System C.

      The same language is good enough to testbench or synthesis their logic may it be a small CPLD/FPGA to large GPU/CPU/ASIC. They don't waste them playing with things to add more abstractions and slow things down. That's one of the many reasons why hardware has always ahead of software development.

    10. Re:Learn OpenCL by HaZardman27 · · Score: 4, Insightful

      That's because the closest analogy to a software engineer using a more abstracted language in the hardware world is the packaging of common circuitry. Or when hardware engineers design chips, do they actually model out the components of every single transistor?

      --
      Apparently wizard is not a legitimate career path, so I chose programmer instead.
    11. Re: Learn OpenCL by Anonymous Coward · · Score: 0

      Assembly is for pussies. REAL programmets code in straight binary.

      01100101 thatshit

    12. Re: Learn OpenCL by Anonymous Coward · · Score: 1
    13. Re:Learn OpenCL by Midnight+Thunder · · Score: 2

      Learn OpenCL and do the job properly.

      This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.

      Well, the first thing is to understand parallel programming and what sort of things work well in a GPU. With that basic understanding, then OpenCL becomes a tool for doing that work. Starting with an OpenCL based "hello world" type application would then be the next step.

      --
      Jumpstart the tartan drive.
    14. Re: Learn OpenCL by mwvdlee · · Score: 1

      Patch cables? Are those the playfully colored, safety-blanket covered, plug-and-play things you kids use these days? Real programmers use a soldering iron and bare metal only.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    15. Re:Learn OpenCL by SplashMyBandit · · Score: 1

      The real trick to efficient GPU programming is trying to keep as much in video memory as you can - by optimizing the textures you use (I'm a GLSL game developer, so this is *the* critical performance issue). I would also recommend OpenCL over CUDA. OpenGL has shown a longevity that made working with it worthwhile, and with billions of mobile devices using alongside PCs (Win/Linux) and Macs it seems that OpenCL could very well have the same longevity too. Since your time is a very precious thing it is worth investing that time in something that will be around for a long time and is be cross-platform (mobiles and tablets are the current fad, the browser with WebGL creating amazing apps could well be the next one).

      As for libraries, I use the JoGL bindings for Java. That allows my application (a jet combat flight simulator in development) to work cross-platform with almost no porting effort. Using Java makes using lots of CPU cores easy, but the performance constraint is never the CPU, it is the GPU - so by using Java to save development time on routine stuff (heap-based resource management under multi-threading) and spend some of the saved time on optimizing the GPU code (which is the performance critical stuff).

    16. Re:Learn OpenCL by AdamHaun · · Score: 4, Informative

      Or when hardware engineers design chips, do they actually model out the components of every single transistor?

      Chip design is absurdly complicated (even on the digital side), and involves several layers of abstraction. In roughly increasing level of detail:

      * Spec level: high-level behavioral description of the functionality of a digital system, something like "8-bit 115.2kbps UART" or "2MHz PWM with 0-100% duty cycle in 0.1% increments".
      * HDL/RTL level: software-like description of the complete system design. Can range from higher-level (describing behavior) to lower-level (describing specific logic). When people talk about buying, selling, or creating "IP" in the chip design world, they're usually talking about RTL for a single functional unit.
      * Gate level: Logic gates and flip-flops and their connections.
      * Transistor level: The transistors that make up the gates, and their connections.
      * Device level: The behavior of an individual transistor.
      * Physical layout: Just what it sounds like; the actual arrangements of metal and silicon.

      There are some more in between, but that should give you an idea. HDLs are not necessarily low-level. For large designs (like modern SoCs), it takes some *very* expensive and complex software to go deeper into the list, and the process is not entirely automated. So I wouldn't say hardware design can't be high-level. The difference is that in hardware, you always have to care about the lowest level when you're doing your high-level design, while in software you can take more things for granted. So even though a board-level design might just be a bunch of off-the-shelf chips hooked together, it still takes a lot of work to make sure everything comes out right.

      --
      Visit the
    17. Re: Learn OpenCL by Anonymous Coward · · Score: 1

      http://xkcd.com/378/

      Enough said.

    18. Re: Learn OpenCL by Anonymous Coward · · Score: 0
    19. Re:Learn OpenCL by Darinbob · · Score: 2

      In software when you take the low level for granted you end up with a typical bloated Windows application. Of course people get away with it because you just mock people who don't have enough RAM or CPU power until they upgrade in shame.

    20. Re: Learn OpenCL by dbIII · · Score: 1

      There was still some stuff you could program with patch cables in use in 1990, back before "neural networks" replaced a lot of the functions of analog computers.

    21. Re:Learn OpenCL by gbjbaanb · · Score: 0

      OpenCL is C based so it shouldn't be that hard to pick up

      the guy's a C# developer. I think you underestimate how VB-ish that language is, despite the superficial veneer of curly brackets.

    22. Re:Learn OpenCL by viperidaenz · · Score: 1

      If you had something higher level than VHDL/verilog, then instead of your compile for your ASIC taking 6 hours, it'll take 6 days

    23. Re:Learn OpenCL by Anonymous Coward · · Score: 2, Informative

      The thing that is hard about gpu programming isn't getting code that works, its getting code that is fast. One of the most significant issues is how the data is arranged and accessed on the GPU. A big portion of this is going to be related to how the data is setup/transfered/accessed over PCIe from/to main memory.

      Basically, your going to want to access that data in a manner that is fairly low level on the cpu side as well. So, the advantages of phython/etc are nullified when you have some binary blob like format your trying to access as a big pinned block of memory. This is the kind of programming that C/C++ specialize in and are really good at. Hence, notice how openCL and CUDA both are very similar to C.

      I'm not saying python isn't going to work, what I am saying is that much like C++ doesn't make a good batch/text manipulation language, python doesn't make a good bit banging language

    24. Re:Learn OpenCL by Chaos+Incarnate · · Score: 3, Funny

      Just because we C# programmers can't do memory management worth a damn doesn't mean we're no better than VB programmers. We at least know what case sensitivity means. ;)

      --
      Benford's Corollary to Clarke's Law: "Any technology distinguishable from magic is insufficiently advanced."
    25. Re: Learn OpenCL by guruevi · · Score: 2, Informative

      I have written code for computational biology - CUDA is a lot easier to pick up if you're just converting from C. They have great examples and documentation, great plugins but you're stuck on a single hardware platform. OpenCL on the other hand is a lot less 'nice' to begin with (pouring over 250 page PDFs with minimal explanation) but allows you to leverage both CPU and GPU efficiently and a lot less hardware independent although these days it's just nVidia for serious GPU computing and maybe Intel is starting to get into the game (don't know, haven't come across their hardware yet), AMD is a joke, not even all their GPUs (or drivers) have support for GPGPU yet and their drivers just suck.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    26. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      but that means stopping now not even bothering with the GPU, since he already has code that works on the CPU. He's done. The project is complete. Next work order.

      No it doesn't. Unless you assume the programmer's time is infinitely more valuable, and the programmer will never run their own code (unlike a lot of projects like this). If a couple hours to get a simple, but heavily used algorithm going on the GPU saves him many more hours in the long run, it is certainly worth it. If it turns out rewriting some python script into C only saves him 10 seconds of cpu time over thousands of uses, even if it took him 5 minutes for the conversion, that is a waste of time (short of ancillary benefits like practicing a new tool).

    27. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      If you do the bit banging or arithmetic on the cpu, and your algorithm involves a non-trivial amount of operations that are straightforward to parallelize, switching to C isn't going to make a big difference. There are plenty of ways to move blobs of binary data around in python, and for many applications, the cost is insignificant. Depending on what you are trying to do, the shader code can actually be a lot easier to write than trying to get the code to run the first time without heavily using a library. And if you are heavily using a library, chances are it won't matter what language you call that library from.

    28. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      What exactly is VB-ish about the language? There's no serious developer that uses drag and drop or go-to or anything like that. Just because it's made by Microsoft and it's cool to bash them does not make you cool. I think your ignorance is shining through here.

      I suppose you assume that Java must be more of a language of men? Java is like a poor man's C#. Obviously MS had the benefits of improved technologies and retrospect. It's actually not at all a hard jump from C# to OpenCL. I was an ASM/C, and later C++ developer who then worked in Java, then C# since about .NET beta (using many other languages too in between) and I found C# to be an improvement on all of those. I still write C++, but only when I can help it. I'm doing game programming and OpenGL, OpenCL, and CUDA all the time.

      FYI, even VB.net is actually quite a nice language once you get past the ugly reference to the past. There's not much you can't do in it unless you are an idiot.

      As far as C#, it's actually more like C++ than VB. First of all, you have a concept of structs and even memory aligning them. I just recently had a customer switch a customer over from Java/libgdx because there was no way they could fix their performance issues due to the lack of equivalent memory management constructs in Java. Secondly, calling to C++ in C# is stupidly easy, especially compared to Java. You have all kinds of pointer types in C#. You can also write unsafe code in C# which in games, people actually do.

      Overall, C# is quite a bit like C++ once you understand the lower-level parts of the language. It's more like if you used C++ with consistently good libraries that actually cooperated and weren't quite as annoying as things out of boost. For most people, they want to get things done, not write giant compiler optimizations, inline every last thing they can, and struggle writing brilliant template hacks. People in the real world use both C# and C++, often together to get things done, especially in games and especially if they don't have huge teams. This is today's reality. I suggest you grow up and learn a little more before you criticize whatever is not your flavor of the month programming language or anything that must suck because MS made it.

    29. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      Actually, have you heard about type inference?
      A programming language like Python can be compiled down to machine code.
      In fact there is a python compiler which produces native CUDA/OpenCL, it also figures out how to move the arrays of data around efficiently.

    30. Re:Learn OpenCL by Anonymous Coward · · Score: 1

      It depends on what you are making.

      I am a hardware engineer who currently works mostly with FPGAs. You program the chips in a language like VHDL, Verilog, SystemVerilog, SystemC. Certain functionalities can be programmed in higher level languages, used right it can be as efficient as hand rolled code, but is very limited to a specific kind of functionality.

      It can take 30 minutes to compile (synthesize), which basically proves your program, that all indices will fit in arrays, that all calculations won't overflow, etc. It produces a netlist.

      Then it will take 5 hours of linking (place and route) to fit the netlist in the physical device, and make sure all the links and logic fit inside a time constraint (of several nanoseconds). Actually, it is trying, most often it doesn't succeed, we run 20 of those place and route jobs in parallel, each with a different seed (well it is a bit more complicated than a seed, but it is represented by a number between 1 and 100).

      For mass production, you can buy FPGA-like devices which can take your design and create fixed wiring on silicon, since normal FPGAs have variable wiring these devices can run your design at a slight faster clock rate.

      For even higher mass production, you can make custom ASICs based on your design and automatically create a routing and electronics. Your design will run even faster, you can also create larger designs.

      But if you need to make special high speed electronics, you will need to manually place some electronics. The highest speeds you will need to place individual transistors. And if you want to be really fancy you can even design your own electronics physically.

    31. Re: Learn OpenCL by jkflying · · Score: 1

      Troll much? All the bitcoin miners use AMD OpenCL for their GPGPU properties. Why? Because they outperform NVidia. You are making sweeping statements with nothing to back you up.

      --
      Help I am stuck in a signature factory!
    32. Re:Learn OpenCL by Anonymous Coward · · Score: 0

      But GPU programming is all about memory management. If you don't know exactly when to allocate/copy/free how many bytes in what kind of memory, you can't program a GPU.

    33. Re:Learn OpenCL by Midnight+Thunder · · Score: 1

      I can remember a demonstration that showed that below a certain amount of "work" it was better to use the CPU and then above that to use the GPU. I can't remember the details, but from what I remember it was because of the overhead of setting up the cores?

      I agree with the OpenCL route, simply because of previous history of 3DFX vs OpenGL. Short term CUDA has the advantage.

      --
      Jumpstart the tartan drive.
    34. Re: Learn OpenCL by mathimus1863 · · Score: 1

      That's not because AMD is better. It's because AMD/ATI has some obscure instructions heavily optimized that happen to be the same instructions useful for Bitcoin mining. And they also tend to focus on a plethora of less-powerful cores, while NVIDIA focuses on using fewer-but-more-powerful cores. Again, that benefits SHA256 operations, but doesn't necessarily benefit other applications.

      In a way, AMD got lucky with this one. It's the same reason the AMD CPUs outperform Intel on hashing -- because their bit-rotates are optimized to one instruction instead of three, even though it's not all that useful for other tasks. Thus, if your reasoning was correct, you'd also conclude that a lot of the medium-range AMD CPUs are faster than the high-end Intel CPUs.

    35. Re:Learn OpenCL by burisch_research · · Score: 1

      While you are completely correct in what you say, you are not answering the question. The question is, how do I get in to massively parallel GPU processing easily -- not, 'is my application suited to this?'. It's assumed that the OP might be an idiot, however he is asking a completely valid question, and you should be responding accordingly.

      Actually, this post has prompted me to re-evaluate my methodologies. I have an enormous image-processing project to complete in short order. I was previously intending to use CPU only, but this post made me think, and go and research OpenCL. What I've found is that not only is it cross-platform, very fast (in many cases), but is also very easy to use (if you already know C very well, as I do).

      The answer is, 'High level languages are not appropriate for this problem set'.

      So, while I must agree that the original question of how to easily use GPGPU techniques from a purely managed environment such as C# is actually pretty inane, the result has been fruitful in that it highlights that if you really want GPGPU performance then you really do need to invest the effort required into actually learning the specific languages necessary. That's a relevant answer, in my book.

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    36. Re: Learn OpenCL by jkflying · · Score: 1

      The GP was claiming that AMD stuff sucks in every situation. I was just pointing out that they were wrong.

      And hashing is really important, particularly for 'new-age' languages like Python and JavaScript.

      --
      Help I am stuck in a signature factory!
    37. Re: Learn OpenCL by Anonymous Coward · · Score: 0

      OpenCL is very doable with Python. You avoid a lot of crap and repetition by using a "shiny layer". You just write your kernels inline in the Python code and upload to GPU. Handle the supporting stuff with Python.

      Give it a shot, it might make you more productive.

    38. Re:Learn OpenCL by SuperTechnoNerd · · Score: 1

      It's also the kind of work. Some things, like fluid dynamics and discreet element analysis lend themselves to very parallel computation.
      Playing solitaire not so much..
      Use the right tool for the right job.

    39. Re:Learn OpenCL by gregor-e · · Score: 1

      If you're learning this for your job, maybe you can persuade your boss to pay for an OpenCL course like this one.

    40. Re: Learn OpenCL by guruevi · · Score: 1

      Experience, I was talking about serious computations, not something you use gaming GPU's for and not a single optimized task either. You could probably take any random task and optimize the shit out of it on either platform, doesn't mean said platform is good for a "general purpose" (the GP in GPGPU).

      Also, even for Bitcoin mining, a lot of rigs require Windows, not Linux, a lot of cards in the Bitcoin guides have disclaimers such as "don't use x with this AMD card" where x is some type of OpenCL instruction or computation type.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    41. Re: Learn OpenCL by Anonymous Coward · · Score: 0

      The GP was not saying that. They were saying that AMD is more difficult to start with, and that their support is not as universal as it seems.

      Further, I know many people that do GPU computing for scientific research where they would gladly optimize the hell out of anything, and I haven't met any that use OpenCL.

    42. Re: Learn OpenCL by slashdot_commentator · · Score: 1

      No, AMD hardware outperforms Nvidia hardware, computational units/$1, for bitcoin mining. That's why bitcoin miners use OpenCL; its about the card. (When one cares about protein folding, programmers that need speed, are going to CUDA and nvidia cards.) Pot, meet Kettle.

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
    43. Re:Learn OpenCL by mcmonkey · · Score: 1

      Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.

      Learn OpenCL and do the job properly.

      "VBing?" "Pythoning?"

      Learn English and answer the question properly.

  4. CUDA by Anonymous Coward · · Score: 1

    CUDA is extremely easy to learn and use (if you know C and of course have an NVidia card) and is well worth the effort for some projects. Alternatively you could try skipping GPU programming and just using OpenMP which would still greatly improve performance if your not already multithreading.

    1. Re:CUDA by Anonymous Coward · · Score: 2, Insightful

      Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.

    2. Re:CUDA by Anonymous Coward · · Score: 0

      Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.

      Maybe when OpenCL is as easy and quick to write as CUDA I'll do that, until then no thanks.

    3. Re:CUDA by UnknownSoldier · · Score: 4, Informative

      Agreed 100% about CUDA and OpenMP! Already invented a new multi-core string searching algorithm and having a load of fun playing around with my GTX Titan combing CUDA + OpenMP. You can even do printf() from the GPU. :-)

      The most _painless_ way to learn CUDA is to install CUDA on a Linux (Ubuntu) box or Windows box.
      https://developer.nvidia.com/cuda-downloads

      On Linux, at the command line fire up 'nsight' open the CUDA SDK samples and start exploring! And by exploring I mean single-stepping through the code. The NSight IDE is pretty darn good considering it is free.

      Another really good doc is the CUDA C Programming Guide.
      http://docs.nvidia.com/cuda/cuda-c-programming-guide/

      Oh and don't pay attention to the Intel Propaganda - there are numerous inaccuracies:
      Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
      http://pcl.intel-research.net/publications/isca319-lee.pdf

    4. Re:CUDA by Anonymous Coward · · Score: 0

      Combine CLU (https://github.com/Computing-Language-Utility/CLU) with the C++ bindings and you get something pretty simple if you're willing to drop the single source feature of CUDA (and really single source is only a driver for template engines, beyond that programmers use multiple compilation units all over the place anyway). Basic CLU vector add sample is 20 lines of host code with simple tools that pull in the device code.

      No loss of ease of use, nice gain in portability.

    5. Re:CUDA by Anonymous Coward · · Score: 0

      I also recommend Udacity parallel programming class.
      https://www.udacity.com/course/cs344

      It really opened my eyes to parallel programming. You will want to learn some basic CUDA syntax before diving in. But you can complete the course without having your own development environment and do all the programming exercise directly from their web interface.

    6. Re:CUDA by Anonymous Coward · · Score: 0

      It's a lot easier to write an open cl program for an AMD or Intel GPU than it is to create a cuda program for them.

    7. Re:CUDA by Anonymous Coward · · Score: 0

      CUDA is proprietary, but is so much further ahead of everybody else. Sometimes you just gotta get the job done.

      Also, skip OpenMP, go to MPI... can be used in distributed memory systems, as efficient these days on shared memory systems, and not really all that much more difficult to use.

    8. Re:CUDA by slashdot_commentator · · Score: 1

      If you don't care how long your programs take to solve a problem, avoid coding in cuda. If you want to keep your job, and your employer needs to run the app on nvidia cards as fast as possible, you're writing it in cuda.

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
  5. Re:XNA or Unity by Tr3vin · · Score: 2

    Those are game engines. They will do nothing to help him use the GPGPU capabilities of his graphics card.

  6. OpenACC by Anonymous Coward · · Score: 1

    don't know what the status is on Windows, but for high-performance computing, OpenACC is an emerging standard, with support by Cray and PGI compilers.

    1. Re:OpenACC by 140Mandak262Jamuna · · Score: 2
      It works in theory. In practice, unless you understand your code well, and the way compiler built the instructions well, and understood what these directives very well, you wont get any speed improvements. There are times when the over heads slow down the code and the simple minded implementation had brain dead locks, and you end up with slower code.

      We have come a long way since the days of assembly and assembly in another name Fortran. But the overheads of the higher level languages have been masked a lot by the ever increasing speed and memory availability. Whole generations of programmers have come up, higher level languages with IDE and CASE tools from day one they fundamentally don't understand how the code actually works. They are continually stumped by the fact the code does what they tell it to do, not what they meant it to do.

      --
      sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    2. Re:OpenACC by SoftwareArtist · · Score: 2

      True, and this is even more true on GPUs than CPUs. They do a lot less to shield you from the low level details of how your code gets executed, so those details end up having a bigger impact on your performance. And to make it worse, those details change with every new hardware generation!

      But for a new user just getting into GPU programming, it's easier to learn those things in the context of a simple programming model like OpenACC than a complicated one like CUDA or OpenCL. That just forces them to deal with even more complexity and hardware details right from the very start. OpenACC can produce good results if used well. And once you've learned to do that, you're in a better position to tackle the harder technologies.

      --
      "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
    3. Re:OpenACC by EyeSavant · · Score: 1

      Yeah was wondering when someone would mention OpenACC, for sure it is the most painless way to start programming on GPUs, as it is compiler directives which means that the compiler deals with most of the heavy lifting for you. For general purpose machines you need to use the PGI or CAPS compilers, which are not free.

      Cuda is not too bad either as it assumes a GPGPU, and is a relatively straightforward extension of C. OpenCL is a mess in my opinion as it does not assume anything so you have to spend 10 complicated function calls explaining you have a GPU before you can actually do anything.

      OpenACC also has the advantage that it is compatible with CUDA so you can write important functions in CUDA for improved performance, and leave the less performance critical parts in OpenACC.

      The first step though really is to understand the hardware, writing GPU code is very different from writing normal code if you want to get good performance, althoguh assuming it vector code with a SIMD size/vector length of 32 will get you a long way as well.

  7. Re:XNA or Unity by stewsters · · Score: 4, Informative

    I don't think he is looking at making a game, I think he is looking for some cheap parallel processing. I have done some cuda, it was a pain to set up a few years back. There probably are better tutorials now.

  8. It's easier than it sounds by Anonymous Coward · · Score: 1

    The heavy lifting has mostly already been done for you. There are CUDA wrappers out there that, with a few changes to your code, run it as close to optimally as possible using the card's cores. We had a Nvidia guy come by and give a talk just to show off how relatively painless it is (similar to OpenMPI, in my opinion). If you've got a couple extra people around consider reaching out to Nvidia to have someone show everyone a few of the options.

  9. Obsidian by jbolden · · Score: 4, Informative

    I get the impression that CUDA/OpenCL is still the best option. This thesis on Obsidian presents, a Haskell set of binding which might be easier and also covers the basics quite well. Haskell lends itself really well because the language inherently is designed for parallelism because of purity and out of order computation. That being said, I think Obsidian is a bit rough around the edges but if you are looking for a real alternative, this is one.

    1. Re:Obsidian by CoderBob · · Score: 1

      I've seen a few people mention Haskell, but no love for Erlang in here. Any particular reason?

    2. Re:Obsidian by jbolden · · Score: 4, Informative

      The big issue is that Haskell is lazy. Which means in particular the programmer by default doesn't determine order of execution. This makes Haskell a better counter example since order of execution is so key to so many languages.

      Erlang's type system is rather typical dynamic while Haskell has a Hindley–Milner type system which again shows off the plusses of functional better.

      Haskell has more of the most sophisticated ideas in computer science than any other language. It has become the standard for computer science in particular language and compiler research. So when an idea is "news" there is very likely an implementation of Haskell of that idea. Erlang's community is more practical and less cutting edge.

      Haskell is easier to program in.

  10. http://hsafoundation.com/ by Anonymous Coward · · Score: 0

    It new and might be a little rough around the edges, but everything else is hacks on top of OEM property "solutions" on top of hardware hacks.

    1. Re:http://hsafoundation.com/ by Anonymous Coward · · Score: 0

      And has nothing useful public software yet and only actually defines an intermediate language. What use would it be to direct someone who's asking for an easy route in straight to HSAIL?

  11. Re:XNA or Unity by i+kan+reed · · Score: 0

    XNA has easy, painless shader compilation. You can plug a C# image class into an XNA texture, pipe it through a vshs shader that you write by hand, and dump the output to a texture, back to an image. That process is highly interoperable with existing C# applications.

    But that ignores the fact that Microsoft abandoned XNA like an unwanted child.

  12. Jitter by handshake,+doctor · · Score: 1

    Check out Max/MSP/Jitter.

    As you describe, the interface is VPL - connecting boxes / nodes to access the GPU is one of the (many) things the program is capable of. Depending on what you're trying to, you may also find Gen useful for generating GLSL shaders within the Max environment (although you can use other shaders as well).

    I'm currently neck-deep in a few Jitter projects using custom shaders, etc., and while it's great for rapid prototyping, getting good frame-rates and production stable code out is a whole black art unto itself. Fortunately, the support and forum community are very strong.

  13. GPU programming *is* pain, princess. by Chris+Mattern · · Score: 4, Informative

    Anyone who tells you differently is selling you something.

    1. Re:GPU programming *is* pain, princess. by Em+Adespoton · · Score: 1

      Anyone who tells you differently is selling you something.

      Works well for CUDA anyway....

  14. Udacity teaches CUDA by Arakageeta · · Score: 2

    Check out the Udacity class on parallel programming. It's mostly CUDA (I believe it's taught by NVIDIA engineers): https://www.udacity.com/course/cs344

    CUDA is generally easier to program than OpenCL. Of course, CUDA only runs on NVIDIA GPUs though.

    1. Re:Udacity teaches CUDA by Anonymous Coward · · Score: 0

      Agreed, I've been going through this course and I'm enjoying it.

  15. C++ AMP by Anonymous Coward · · Score: 0

    It is Microsoft, but have you looked at C++ AMP?

    http://en.wikipedia.org/wiki/C%2B%2B_AMP

  16. OpenACC by SoftwareArtist · · Score: 4, Interesting

    OpenACC is what you're looking for. It uses a directive based programming model similar to OpenMP, so you write ordinary looking code, then annotate it in ways that tell the compiler how to transform it into GPU code.

    You won't get as good performance as well written CUDA or OpenCL code, but it's much easier to learn. And once you get comfortable with it, you may find it easier to make the step from there into lower level programming.

    --
    "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
  17. Very Similar Story by Chaseshaw · · Score: 2

    VB.NET background. Wanted to get into GPGPU to accelerate some of my more complicated math calculations. Tried CLOO (open source .net GPU wrappers) and couldn't get it to work, tried AMD's OPENCL dev gui, couldn't get it to work. Eventually found the answer in python. GPGPU in pyopencl is well-documented thanks to the bitcoiners, and from .net you can either run the python in a shell, or write a little python kernel to listen for, and process commands. Only catch is the opencl abilities are limited, and you have to start dabbling in c++ to get it to do any real work (and even then it's a dumbed-down c++ and many existing extensions don't install or work quite right). All in all I found the entire thing very rewarding though. :) Best of luck.

  18. Learn OpenMP by Anonymous Coward · · Score: 0

    Learn about parallel programming with OpenMP, which you can run on your normal machine. If you take enough time to do that properly then the OpenMP standard will also support GPUs, and the move to such architectures will be easy.
     

  19. Coursera has a great course. by Anonymous Coward · · Score: 0

    Heterogeneous parallel programming. It cuts it. In a few lessons you will know where you are heading.

  20. Proper approach to GPU programming by godrik · · Score: 1, Insightful

    Like in all attemps at getting stuff faster, you should first wonder what kind of performance you are already getting out of CPU implementation. Provided you seem to believe it is actually possible to get performance out of a VB like langage, I assume that your base implementation heavily sucks.

    Putting stuff on a GPU has for only goal to make things faster but it is mostly difficult to write and non portable. Having a good CPU implementation might just be what you need. It also might be easier for you to write.

    If you really need a GPU, then you need to start learning how GPU works, because a simple copy paste is unlikely to give you any significant performance. A good start at: https://developer.nvidia.com/cuda-education-training

    I never properly learned opencl, but it is essentially similar. Except you have access to less low level details on nvidia architecture. Of course, cuda is pretty much nvidia only.

    1. Re:Proper approach to GPU programming by godrik · · Score: 1

      Frankly, I don't know anything about C#. But I know quite a bit of High Performance Computing. What I can guarantee you, is that I never saw an high performance routine written efficiently in anything else than in C or in C++. Sure people claim to do great in java. But that's only a claim, I still have to see seriously complicated implementations in Java.

      Regarding C#, honestly, nobody writes C# for HPC in the academia. I never met a single one. But I expect it to be similar. The main problem you get is that you are too far away from your architecture. Basically, if you can force memory placement (structure of array, array of structure), if you can not choose between pointer and array indirection, if you can not specify memory prefetching, or if you can not force which vectorial instruction to write, then your code will be suboptimal. If you can specify all of that in C#, then it is a good candidate. In my experience, that's why people turn to C or C++, because the architecture of the processor is entirely exposed at that level.

      Often programmer in other languages end up turning to GPU for performance, where they could have gotten the same performance out of a CPU. But it is so much cooler to say that you use a GPU, instead of going low level on a CPU...

  21. C++ AMP by VertigoAce · · Score: 1

    Take a look at C++ AMP. It is a small language extension that lets you target the GPU using C++. The platform takes care of most of the mechanics of running code on the GPU. Also check out this blog post for links to tutorials and samples.

  22. it ain't by Anonymous Coward · · Score: 0

    barraCUDA because that'll eat your motherfucking ass alive man!

  23. Coursera by elashish14 · · Score: 2

    Coursera has some courses on GPU programming, like this one, and what's nice about them pretty slow, and I'm assuming that they explain things well. Other online courses probably offer the same, and I think the video lectures would be helpful in understanding the concepts.

    --
    I have left slashdot and am now on Soylent News. FUCK YOU DICE.
    1. Re:Coursera by Anonymous Coward · · Score: 0

      CUDA yuck. Last time I checked, it was lying about OpenCL

    2. Re:Coursera by jasax · · Score: 2

      I took that course: https://www.coursera.org/course/hetero

      I also took a course from Udacity: https://www.udacity.com/course/cs344 but this one I didn't finish, I've done perhaps 30% of it (I already had finished Coursera's). One of these days I'll go there to close matters :-)

      The courses in Udacity are "always online", so anyone can register anytime and finish the course with his/hers own pace. Quizzes, exams and grading with certificate included have no fixed limits. On the other hand, the courses from Coursera have deadlines and run more or less in parallel with "snail" university schedules, with start and stop dates, with time limits in quizzes and exams, etc. (You can usually see videos, and do quizzes anytime after they end, but no certificates and grading AFAIK).

      Both courses were good -- I recommend both, -- we did homeworks in Amazon's cloud transparently, and certainly both were "sponsored" by Nvidia, coz we learned only CUDA. (Perhaps there was a brief blah blah about competing alternatives.)

      But from what I've seen, if someone is afraid from CUDA, then its better to run away very fast from alternatives (OpenCL) :-)

    3. Re:Coursera by burisch_research · · Score: 1

      Having researched both, OpenCL is definitively better by far. Granted, CUDA has the native advantage, but that's not always going to be there, and I think most would agree that vendor tie-in is a Very Bad Thing (tm)

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    4. Re:Coursera by jasax · · Score: 1

      I didn't use OpenCL (helas, both two courses are CUDA-directed), but compared some examples, written in CUDA and OpenCL, as given in http://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0123814723/ref=pd_sim_b_3 (written by one of the instructors of Coursera HPP). My conclusion was that CUDA is more "friendly" - but indeed it is Nvidia's proprietary technology.

      Since then I had no more time to explore OpenCL and other alternatives (some are proprietary and I don't intend to buy...). Eventually I would agree with you and would choose OpenCL if I was doing professional work, but that hasn't happened yet...

      Final word: CUDA is now in its 5.X incarnation, and AFAIK it kept improving its user-usability with the release of recent versions...

  24. Nitrous Oxide by Anonymous Coward · · Score: 0

    LOTS of it.

  25. OpenCV by SpinyNorman · · Score: 1

    Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.

    1. Re:OpenCV by Anonymous Coward · · Score: 0

      Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.

      Came here to say this. OpenCV has CUDA support (with the gpu:: module) and OpenCL support (with the ocl:: module). GPU support isn't complete (not everything is implemented) but it's actively being developed (it was only CUDA until recently).

      You might think that you are clever rolling your own image processing algorithms, but chances are, the OpenCV people have done a better job. OpenCL has bindings for Python, but I've only ever used it in C++ projects.

    2. Re:OpenCV by Anonymous Coward · · Score: 0

      "Quit your research"

      Yeah, real handy advice.

  26. Nothing easy but Udacity can help by Jthon · · Score: 5, Informative

    So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.

    Udacity currently has a fairly decent intro course on GPU programming at: https://www.udacity.com/course/cs344

    It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.

    I'd suggest running through that and then deciding on what API you want to end up using.

  27. Other option by Anonymous Coward · · Score: 0

    Consider the Intel image processing libraries. They have a broad range of routines that are highly optimized for their processors.

  28. not TOO hard by Anonymous Coward · · Score: 0

    If you know multithreading concepts, OpenCL isn't too hard to get into.
    Ofcourse, start small, do tutorials, and do it right.

    Much much much easier than trying to do stuff in pixel shader, or ,even worse, the assembly like shading language that came before GLSL.

  29. DirectCompute intro by Anonymous Coward · · Score: 0
  30. Understand The Hardware by Anonymous Coward · · Score: 3, Informative

    If you are going to program a GPU, and you are looking for performance gains, you MUST understand the hardware. In particular, you must understand the complicated memory architecture, you must understand the mechanisms for moving data from one memory system to another, and you must understand how your application and algorithm can be transformed into that model.

    There is no shortcut. There is no magic. There is only hardware.

    If you do not believe me, you can hunt up the various Nvidia papers walking you through (in painful detail-- link below) the process of writing a simple matrix transpose operation for CUDA. The difference between a naive and a good implementation, as shown in that paper, is huge.

    That said, once you understand the principles, CUDA is relatively easy to learn as an extension of C, and the Nvidia profiler, NVVP, is good at identifying some of the pitfalls for you so that you can fix them.

    http://www.cs.colostate.edu/~cs675/MatrixTranspose.pdf

  31. OpenACC or OpemMP 4.0 are exactly what you want by John_The_Geek · · Score: 5, Informative

    I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.

    There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.

    I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.

    BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.

    1. Re:OpenACC or OpemMP 4.0 are exactly what you want by Anonymous Coward · · Score: 0

      >BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone.

      Can you elaborate on this? My understanding was that Fusion was just a marketing term to describe AMD's move to "APUs".

    2. Re:OpenACC or OpemMP 4.0 are exactly what you want by John_The_Geek · · Score: 1

      APUs (and Fusion is indeed AMD's marketing term for them), with their CPU/GPU shared memory, don't require the same programming model as normal GPUs. The data movement issue goes from being foremost to being non-existent. Hence OpenCL becomes moot. And AMD was really the last serious supporter of OpenCL.

    3. Re:OpenACC or OpemMP 4.0 are exactly what you want by Anonymous Coward · · Score: 0

      The purpose of OpenCL is to provide a standard for heterogeneous computation. I don't see how AMD's position is doing anything to diverge from OpenCL's applicability.

    4. Re:OpenACC or OpemMP 4.0 are exactly what you want by burisch_research · · Score: 1

      John,

      While I have to defer to your position as being a teacher of these things, I have to question what you say.

      a) OpenCL was intended as an open access API to GPGPU techniques. Has something changed to channel people into vendor-specific approaches?

      b) What advantages do OpenACC and OpenMP 4 offer over previous techniques? Are these standards-based?

      c) Which GPGPU language (if any) can one target in the sure knowledge that it is future-proof? In which ways is this superior to OpenCL?

      These are genuine questions that I really want answers to.

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    5. Re:OpenACC or OpemMP 4.0 are exactly what you want by John_The_Geek · · Score: 1

      Good questions, BR:

      a) As the three major accelerator vendors (NVIDIA, Intel and AMD) have diverged on their fundamental hardware approach (GPU, MIC and APU) the older, generic approach of OpenCL has devolved in relevance to only GPUs. And since even NVIDIA would prefer that you use CUDA at this level for GPUs, there are no commercial resources going into supporting it.

      b) OpenACC and OpenMP are both directive based. This means they do not disrupt your current code base and they are easy to try in an incremental (one loop at a time) way. They are both related, originating form the same OpenMP standards people, and are both open standards (openmp.org and openacc-standard.org). Almost all compilers support OpenMP 3, and will eventually support OpenMP 4, which the is relevant version for accelerators.

      c) OpenMP has been around since the 90s, and is well used. I wouldn't hesitate to suggest a project invest in it.

      As regards the original question, OpenACC/OpenMP are much, much easier to use than OpenCL. Also, there are many more successful use cases in the HPC world to point to and learn from.

      John

    6. Re:OpenACC or OpemMP 4.0 are exactly what you want by burisch_research · · Score: 1

      Thanks for your response GTG, I appreciate your answers.

      Let me give you my perspective: I am a very experienced dev, quite happy in ASM, C, C++, and C#, and other languages (but I hate Java with a passion!)

      I currently have a problem set which is extremely amenable to being solved with GPU development. I have had a look this evening at OpenCL, and I must say I was very impressed with how simple it appeared to me, coming from a C background. I was just about to plunge into the dev on OpenCL, when you [very inconsiderately!] disrupted my plans by suggesting that OpenMP might be a better plan.

      Since you were so rude as to disrupt my plans (!), may I please ask for your suggestions for toolchain + ide for OpenMP dev? (I'm on Windows generally, but happy with Linux also ...)

      All the best :)
      c

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    7. Re:OpenACC or OpemMP 4.0 are exactly what you want by burisch_research · · Score: 1

      JTG* sigh ... I'm so inattentive ...

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    8. Re:OpenACC or OpemMP 4.0 are exactly what you want by John_The_Geek · · Score: 1

      My focus, both as a developer and instructor, is HPC, so I have generally been using PGI compilers on large Linux machines. That is also the teaching environment that we provide for students. If you do not have a PGI, Cray or CAPS license available, then OpenACC is not yet implemented anywhere else and you are out of luck at the moment. PGI does have a very nice trial license program, and I'd encourage you to give it a try if it seems at all viable, but that will not answer a longer term access issue. BTW, the PGI compiler does integrate with both Visual Studio and Eclipse, although I personally have very little OpenACC dev hours on either of those.

      OpenMP 4.0 (and that is what you need) is currently only implemented in the Intel compiler. 4.0 was only finally approved this past month by the standards committee, so it may take a while for all the compilers to catch up. The big issue here is that Intel is currently only supporting the MIC/Phi architecture for compiler output at the moment. And one can imagine that they are in no rush to change that. OpenMP 3.x is supported very well by MS, Gnu, PGI, Intel and by all of the associated IDEs, so eventually this will all sort out like one would hope.

      So, if you want to start developing right now, you will either have to shell out for a compiler and/or limit yourself to certain architectures. Those may be complete non-issues for you. If they are not, you will have to either wait for the situation to improve, or drop back to OpenCL. If so, at least be aware that it has been "deprecated" in a practical sense by the only ones that matter (Intel, NVIDIA and AMD).

  32. CUDAfy.NET by Anonymous Coward · · Score: 0

    I've heard decent things about CUDAfy.NET.

  33. Learn to Program an Intel Phi instead by quarkie68 · · Score: 1

    The only painful thing you have to do is to decide how to increase threading in your code.

    1. Re:Learn to Program an Intel Phi instead by TechyImmigrant · · Score: 1

      Yes. This.

      60 independent cores with general purpose instruction set on the same die with fast interconnect. If you need to pack some parallel speed on and do real work, using a GPU is pissing in the wind. An Intel Phi lets you get the job done.

      GPUs do certain things very well, but the odds of your problem mapping well to GPUs is slight.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    2. Re:Learn to Program an Intel Phi instead by EyeSavant · · Score: 1

      Huh?

      Are you seriously suggesting that some stuff will work well on Xeon Phi but badly on a Kepler? To get good performance out of a GPU you need thousands of threads. To get good performance out of a Xeon Phi you need thousands of SIMD instructions. A Phi has 60 cores, to get maximum performance you need 120 threads minimum because it needs to alternate between at least two threads to get peak performance, IIRC it can't schedule the same thread consecutively, so if you don't have at least 120 threads it is not even possible to fill the machine theoretically. It is recommended to have at least 240 threads to hide the memory latency (like a GPU it has very low overhead thread swapping so you can just move to another thread while you are waiting for your data from memory). Then it has 512 bit vector instructions (which is enough for 16 floats or 8 double precision numbers), so you are looking at around 4 thousand SIMD floats at LEAST in flight just to even get close to filling the machine.

      Getting stuff to work on a Phi is easier than getting it to work on a GPU, but anything that works close to peak on a Phi will also work well on a GPU.

    3. Re:Learn to Program an Intel Phi instead by TechyImmigrant · · Score: 1

      Getting stuff to work is one rather important aspect of getting stuff done. Also those Phi threads are big honking general purpose threads with lots of cache and ALU resources, not a highly strung state machine hanging of a matrix multiplier.

      There is a small subset of problems that map to parallel threads of SIMD operations. Try optimizing IC layout on a GPU, or evaluating the biases in a crypto function by running the probabilities backwards through the gates. Those are not a problems for GPUs, but they are real world problems that need addresses and take a bucket load of CPU.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    4. Re:Learn to Program an Intel Phi instead by EyeSavant · · Score: 1

      Getting stuff to work is one rather important aspect of getting stuff done.

      True, but "porting" something to the PHI so that it runs at roughly the same speed as the host GPU is hardly progress. The phis have lower clockspeed than the host cpus and in-order execution.

      Also those Phi threads are big honking general purpose threads with lots of cache and ALU resources, not a highly strung state machine hanging of a matrix multiplier.

      There is a small subset of problems that map to parallel threads of SIMD operations. Try optimizing IC layout on a GPU, or evaluating the biases in a crypto function by running the probabilities backwards through the gates. Those are not a problems for GPUs, but they are real world problems that need addresses and take a bucket load of CPU.

      I fully accept that there is a lot of stuff that needs a lot of CPU that will run badly on a GPU. What I am not convinced about is that there is much that does not map to a lot of SIMD threads that WILL run well on a Phi. You need 240 threads, you need to fill the 512 bit vector registers with SIMD operations, to get peak performance you need huge amounts of parallelism. The same as you need on a GPU.
      You have the same problem with the PCIe bus being horrendously slow as well, you can offload the whole thing (which is coming to GPUs as well, with ARM procoessors for the more general purpose stuff), but you are still limited to one card, and getting from one card to another is a PITA, particularly if they are on different nodes.
      For general purpose stuff needing bucket loads of GPUs you use normal CPUs with over twice the clock speed, out of order execution and a shorter vector length,much more memory per core. Potentially together with MPI and fast inter-node communication (e.g. infiniband, Cray Aries etc).
      For some things GPUs are great, for others they are horrible. My gut feeling is pretty much the same things will be great or horrible for Phis as well. I am not really familiar with the algorythms for the stuff you are talking about, but sure if they do not come back to a lot of SIMD threads they will work badly on a GPU. What remains to be shown is if they can work well on a Phi.

    5. Re:Learn to Program an Intel Phi instead by godrik · · Score: 1

      I was one of the first phi user outside of intel (not in thefirst batch, but in the second one). And programming Phi can be quite painful as well. People always try to make you believe that perfoance is easy. But frankly, it is not. You need to understand how the architecture works and many people are not trained like that nowadays. Throwing a GPU or a Phi will only bring more problem.

      From what the OP says, it is not even clear he used all the processing power available on his CPU. And since he ties to get performance out of a "visual" language, I assume he is far from what is possible.

  34. Do you need the GPU? by jones_supa · · Score: 2

    You would probably see a multi-fold increase in performance by simply converting your project from C# to C++.

    1. Re:Do you need the GPU? by greg1104 · · Score: 1

      Possibly, but there are a lot of tasks that only see about a doubling of speed. A C++ port is only likely to speed things up, while a GPU one is certain to. (Presuming the assumption about parallel execution is correct)

    2. Re:Do you need the GPU? by godrik · · Score: 1

      That's a buggy claim. There is nothing in GPUs that ensures you will get performance. Many algorithms are very difficult to write in GPUs. You have (essentially) no cache which make none trivial memory access slow. You have thread divergence issues which can kill your performance even if it contains significant parallelism. There is no interwarp synchronisation which is quite painful for fine synchronisation.

      Clearly the picture is more complicated than "parallel execution" => performance on GPU. If you have lots of taasks to do, but they all have different code. Then a GPU is useless despite massive parallelism.

  35. GPU Maven Plugin by Anonymous Coward · · Score: 1

    Closest to painless I know of is https://bitbucket.org/bradjcox/gpu-maven-plugin

    The GPU Maven Plugin compiles Java code with hand-selected Java kernels to CUDA that can run on NVIDIA GPUs of compatibility level 2.0 or higher. It encapsulates the build process so that GPU code is as easy to build with maven as ordinary Java code. The plugin relies on the NVidia CUDA SDK being installed which must be done separately.

  36. Sorry but people here are full of crap by Anonymous Coward · · Score: 0

    Use c# and Microsoft Accelerator.

    It's very easy to use, and since the VAST majority of your processing is going to occur on the GPU, the language you use is mostly irrelevant.

    The main thing you need to be aware of is that the bus to the video card is very, very, very slow. So in order to get any speedup from the GPU, you'll need to send as much stuff to be processed to the video card as you can. Round-trips hurt you a lot, so minimize them any way you can get away with doing so.

    1. Re:Sorry but people here are full of crap by burisch_research · · Score: 1

      Deprecated, and quite a while back at that. Do not use.

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  37. OpenSceneGraph or OGRE by bzipitidoo · · Score: 1, Interesting

    I went with OpenSceneGraph.

    Long ago, I tried xlib only, because at that time Motif was the only higher layer available, and it was proprietary. It was horrible. xlib has been superceded by XCB, but I wouldn't use that, not with all the other options out there today. XCB is a very low level graphics library, for drawing lines and letters in 2D. 3D graphics can be done with that, but your code would have to have all the math to transform 3D representations in your data into 2D window coordinates for XCB. LessTif is a free replacement for Motif, but by the time it was complete enough to be usable, the world was already moving on. With Wayland likely pushing X aside in the near future, XCB and xlib may not perform so well. They will continue to be supported for a while through a compatibility layer, but I think they're on the way out. Motif is also not much good these days either. For one, Motif rests on top of xlib, and if xlib goes, so does Motif. Today, we have many better libraries for interfacing with GUIs.

    When OpenGL became available, I tried it. OpenGL is great for drawing simple 3D graphics, but it lacks intelligence. The easy part is that you just pass x,y,z coordinates to the library routines, and OpenGL does the rest. The bad part is that if you want to draw a fairly complicated scene, containing many objects that may be partly or completely hidden behind other objects, OpenGL has no intelligence to deal with that. It just dumbly draws everything your code tells it to draw. To speed that up, your code has to have the smarts to figure out what not to draw, so it can skip calling on OpenGL for invisible objects.

    That's where a library like OpenSceneGraph comes in. Your code feeds all the info to OSG. OSG figures out visibility, then calls OpenGL accordingly.

    You may need still other libraries for window management, something like FLTK. Yes, FLTK and OSG can work together.

    You will also most likely be working in C/C++. OpenGL has many language bindings. But OSG is C++ and doesn't have so many. FLTK is also C++, and has even fewer bindings. Trouble with picking a language like Python for this work is that it can be difficult to find bindings for all the libraries. Even when bindings to a particular language exist, they tend to be incomplete, and don't always perfectly work around differences in data representation. Pick libraries first, then see what language bindings they all have in common, then code in one of those common languages. It's possible C/C++ will turn out to be the only language common to all the libraries.

    --
    Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    1. Re:OpenSceneGraph or OGRE by bzipitidoo · · Score: 1

      Gah, should have read the summary more carefully. I was talking about 3D graphics, not general programming on the GPU.

      --
      Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    2. Re:OpenSceneGraph or OGRE by Anonymous Coward · · Score: 0

      Huh?

    3. Re:OpenSceneGraph or OGRE by Anonymous Coward · · Score: 0

      FLTK is great, unless:

      - You want an architecture designed after 1990

      - You want to do anything remotely complicated. (Auto-hide menu for instance)

    4. Re:OpenSceneGraph or OGRE by burisch_research · · Score: 1

      Whew, I was just about to launch into a tirade on how wrong you were! As it stands, I'm going to be a lot less tired than I'd thought I'd be!

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
  38. try Theano by Anonymous Coward · · Score: 1

    You could give Theano a try. It's a python based symbolic expression compiler which interface is very much like numpy. I use it on Linux but I've heard mention of support for Windows.

    http://deeplearning.net/software/theano/

  39. Re:XNA or Unity by Anonymous Coward · · Score: 2, Informative

    Incorrect. That is certainly a valid approach and the GP should be modded up.

    Using textures and shaders you can very easily do massively parallel floating point operations in XNA on the GPU, and it's a language the asker is familiar with.

    Think outside the box a little bit.

  40. Rootbeer by Anonymous Coward · · Score: 0

    I admit I don't know much about GPU programming.
    But if I were you, I'd take a good look at the rootbeer compiler, which translates Java code into CUDA or OpenCL

    http://rbcompiler.com/
    https://github.com/pcpratts/rootbeer1

    It sure looks simple and Java is just a small step from C#.

  41. Image Processing DSL by Anonymous Coward · · Score: 1

    Look at MIT's Halide it's a domain specific language for image processing. http://halide-lang.org/

    The alternative is OpenCL/CUDA, which require in-depth knowledge of the H/W to get the best from the GPU. It doesn't matter whether you use Python or whatever bindings you choose for a GPU native language. The hardest part is mapping the algorithm to the H/W model of a GPU. PyCUDA does NOT solve that issue.

    You can get plenty of help from Stackoverflow.

  42. Mary Hall at The University of Utah by TwineLogic · · Score: 1

    I wouldn't call her advanced coursework easy, but a resource that belongs on this thread: http://www.cs.utah.edu/~mhall/cs6963s09/

    Mary Hall is a professor of Computer Science. Her recent work is related to compilers and parallel programming on GPUs. Her professional web page is something like an on-line open course, or the framework of one.

  43. Take it from someone who's done a lot of CUDA by mathimus1863 · · Score: 1

    There isn't really a painless way. Like a lot of skills in life, the only way to learn is through pain, suffering and frustration. But it makes the prize all the much more enjoyable. You need to be experienced at regular, serial programming in C/C++, then mangle all of it to figure out how to program in parallel. I literally read the CUDA programming's guide 5 times. And I felt like I gained as much on the fifth time as I did the first time. And don't expect your debugger to save you -- if it's like it was a year ago, you're going to struggle a bit with that.

    Luckily, once you do get it, it all seems to make sense in hindsight. And when you do achieve that 10x-300x speedup, you'll feel like a superhero. You just have to be patient and expect some frustration. It's not like learning a new programming language. It's like a whole new programming paradigm.

  44. Re:XNA or Unity by gl4ss · · Score: 1

    ms has a habit of abandoning one product and then other guys in the same fucking company forcing you to use xna.* libs on their brand spanking new hardware.

    but actually that sounds like a possible solution for the guy, the pain being writing the shader.

    silverlight abandoned? what the fuck are you doing shipping sdk with silverlight libs on almost the same fucking day?! I see though where elop learnt his trade.

    --
    world was created 5 seconds before this post as it is.
  45. Re:XNA or Unity by vlueboy · · Score: 1

    Yeah, I know the feeling.
    It would be one more tool under my belt. For instance, most non-financial people hear of unemployment numbers and a few know where to view the official data. For some bizarre reason the government offers no graphs at dol.gov alongside their statistics, even though they let you download years worth of raw data. Enter us geeks, who easily put together a spreadsheet to make sense of official unemployment trends and zoom into the data all we want and run our won analysis.

    One day knowing Opencl might let me to do similar processing that would otherwise be out of my reach. The potential alone has merit. Executing basic parallel programming without fear will yield a better accomplishment than the last multi-day experiment I ran on my GPU: mining up to one bit cent.

  46. Ask a neck-beard... by OhSoLaMeow · · Score: 1

    ... to code it in COBOL for you.

    --
    They can take my LifeAlert pendant when they pry it from my cold dead fingers.
  47. I recommend CUDA if... by Anonymous Coward · · Score: 1

    I recommend CUDA if you can deploy requiring NVIDIA hardware. CUDA allows for pre-compiled kernels, CUDA has a debugger for your kernels, CUDA has a tool chain. CUDA has far richer options. Indeed, NVIDIA uses LLVM for it's CUDA compilers so in theory different programming languages can be used to write CUDA kernels. Take a gander at: https://developer.nvidia.com/cuda-llvm-compiler

    In contrast, OpenCL is somewhat barbaric. It is an API and there are very few tools for it. Worse, OpenCL implementation can be all over the map.

    You do NOT need to use or for that matter use OpenGL to use CUDA or OpenCL. The interop APIs between OpenGL and OpenCL or CUDA are to make buffer transfers efficient between the two (so that one can compute something with CUDA or OpenCL and have it drawn with OpenGL).

    1. Re:I recommend CUDA if... by Anonymous Coward · · Score: 0

      Open CL allows for pre compiled kernels too. Nvidia uses llvm for opencl too. Never ever use cuda. Always use opencl. Cuda is just proprietary garbage.

  48. CUDA by Anonymous Coward · · Score: 0

    was going to suggest openGL or DirectX but i think the poster wanted a general programming language.

    not sure if this is helpful but i found a website about CUDA for video cards at: http://docs.nvidia.com/cuda/index.html

    i don't think CUDA programs will work on my AMD video card though. lol it'll be cool if i could create a program that uses the 800MHz GPU and DDR3 VRAM just for fun.

  49. Try Wolfram Mathematica by gdelfino · · Score: 1

    You could download a 30 day free demo of Wolfram Mathematica and play with its GPU support. They have done a good job of automating a big part of the complex GPU programming process. http://www.wolfram.com/products/cuda-opencl-programming-mathematica.html

  50. Re:XNA or Unity by Anonymous Coward · · Score: 0

    Even GLSL or HLSL are fine for an introduction to GPU processing. You won't be doing GPU bitcoin mining or any serious data tasks with it in the end, but it's fine for spreading out some of the work from your CPU.

  51. C or C++ with vectors by gnasher719 · · Score: 1

    OpenCL or CUDA is a real pain, and a lot to learn. But any modern Intel quad core processor can deliver 50 billion floating point operations per second if you treat it right.

    Use C or C++ with the Clang compiler (gcc will do fine as well probably) and vector extensions. Newer Intel processors have 256 bit vector registers, so you can define vector types with 32 8-bit integers, 16 16-bit integers, 8 32-bit integers or 8 single precision floating point numbers, or 4 double precision floating point numbers. You can do two operations with such vectors per cycle if you take care about latency. And on a more expensive processor, you can run 8 threads in parallel.

    If 50 billion floating point operations per second is enough, then you're fine. And if you can't manage to produce 50 billion FLOPS/sec in C or C++, then you don't even need to try OpenCL.

    1. Re:C or C++ with vectors by Anonymous Coward · · Score: 0

      I've done that. I got much better performance in asm than C intrinsics. High end GPUs can hit 1 TFLOP double precision so that's where I'm heading. Going to give MS AMP a shot.

  52. C# by TechyImmigrant · · Score: 0

    >I am an intermediate-level programmer who works mostly in C# NET.

    I am so very, very sorry. I hope you find a better job soon.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    1. Re:C# by Anonymous Coward · · Score: 0

      C# and its toolchain are decent.
      Come to the dark side.
      We have cookies.

    2. Re:C# by TechyImmigrant · · Score: 1

      I program in gates. The dark side runs on my gates.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  53. Write some graphics shaders and multithreaded prog by fatgraham · · Score: 1

    I've just started with opencl and love it, it's fast, easy, debuggable (codel) and -with stable drivers- not too much of a pain when it goes wrong.

    I've been writing hlsl, glsl and arb vertex shaders for years and to me, opencl kernels are basically the same thing (language and limitation wise). Convert some full screen graphics effects to opencl for a first example, then make it do other stuff (maybe with buffers instead of images).

    Once you're used to making/debugging kernels, start splitting code/algorithms into smaller chunks, and start parallelising!

    Once it works, start digging into specific opencl/cuda stuff (local vs global memory etc) to start optimising

  54. Use CUDA with Thrust by Anonymous Coward · · Score: 0

    Check out nVidia's Thrust (https://developer.nvidia.com/thrust). It uses STL-like containers and algorithms to allow you to do many common GPU operations quite easily from a C++ environment. I've implemented entire image processing algorithms using Thurst. They also have fairly good documentation and examples (http://docs.nvidia.com/cuda/thrust/).

  55. NPP by dsouth · · Score: 1

    The easiest on-ramp to speeding up image/video processing is probably the npp library https://developer.nvidia.com/npp [nvidia.com] It has functionality and syntax similar to Intel's ipp library but uses an NVIDIA cuda-capable GPU to accelerate the operations.

    If you want to dig in deeper you could explore OpenACC http://www.openacc-standard.org/ [openacc-standard.org] OpenACC is a directives based approach to accelerator programming. You comment or mark up your code with OpenACC directives that provide additional information that the compiler can use to generate parallel code.

    Finally, you can learn CUDA C, or OpenCL, or CUDA Fortran, or NumbaPro, or one of the other programming languages that are supported on the GPU hardware of your choice. NVIDIA's CUDA C compiler is based on LLVM and the IR changes have been upstreamed to LLVM.org, There are several languages and projects in development that are leveraging the LLVM infrastructure to add GPU/parallel support.

    [disclaimer: I work for NVIDIA, but the words above are my own.]

  56. R/gputools, PyCUDA, PyOpenCL, MATLAB... by Anonymous Coward · · Score: 0

    ...some more ideas here:
    http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-99.html

  57. Re:XNA or Unity by Tr3vin · · Score: 2

    How many boxes do you want to go through before you get to the solution? Sure, he could write it as a shader, but that hardly requires pulling in something like Unity or XNA to build the project.

  58. Cudafy.net by rkcth · · Score: 1

    I was in the same boat, I have an image processing algorithm that can take up to 10 seconds on an older mid-range CPU, its for the processing of product photos into high quality "perfect" production ready photos. I am also a C# programmer, and when looking into options I came across CUDAfy.net. it lets you code in C# and uses ILSpy to take your compiled C# and turn in into CUDA C which is then compiled. This is then cached so production machines only need to include the cache. I just spent all day today recoding my algorithm and while I found it a little complicated to get started (mostly since I didn't understand how threads and "blocks" work initially, I got my algorithm ported in a day (well the main part, some of the little cleanup, probably another day or two to be 100% ported). I think that's pretty dang good especially since my original algorithm was not even run in parallel. Also I timed it and its taking 0.3 seconds, so that's about a 33X speedup so far, I figure the remaining code will bring that down to about 20X. I'm using a GTX 650 TI Boost card which cost under $200. CUDAfy.net can also work with OpenCL though I haven't tested that aspect out yet. Overall if you want the most painless shift from C# to GPU coding I would recommend checking out CUDAfy.net Its free and licensed under LGPL so you can use it in commercial code.

  59. Re:XNA or Unity by Cammi · · Score: 1

    Can't ignore something when it has been: 1. Discontinued 2. No longer running on the next big version of Windows (9). 3. There is no replacement, thus Microsoft does not want ANYONE to develop for windows/xbox.

  60. GPU Programming Requires a Different Mindset by ImprovOmega · · Score: 1

    I took some parallel processing classes in the last couple of years as part of my Master's program. CUDA was one of those tricky little beasts that basically takes a few minutes to learn (assuming a rock solid C/C++ background) but a lifetime to master the nuances.

    We were building little throw-away matrix multiply programs - for which we were given horribly inefficient and barely functional source to start with. The challenge was to make it run as fast as possible, with extra credit going to the fastest implementation. It turns out to accomplish this you basically need to understand every tier of the memory architecture of CUDA, the process by which it reads in cache lines to avoid collisions, how to optimize the read/write patterns, how the job would be split up among the GPU's (and the parameters used for the splitting), and basically every nit-picking detail of how the hardware actually runs.

    This runs counter to the level of abstraction that most CS majors are used to dealing with - if we wanted to do hardware we would've gone the EE or CE route - but if you want to truly want to grok CUDA, you have to become a hardware wiz. Otherwise you'll always be stuck wondering why you can never seem to get the level of speedup that the benchmarks suggest should be possible.

    1. Re:GPU Programming Requires a Different Mindset by Skapare · · Score: 1

      You should already know how to do a matrix multiply by now, and not need someone else's source code. The task is to figure out how to partition the work most effectively for the GPU. Classic matrix multiply source code would be misleading at best.

      Or switch to an embarrassingly parallel project like Mandelbrot/Julia set calculation. Now the challenge is to make it do multi precision arithmetic so you can go deep.

      --
      now we need to go OSS in diesel cars
    2. Re:GPU Programming Requires a Different Mindset by mpfife · · Score: 1
      | This runs counter to the level of abstraction that most CS majors are used to dealing with

      That's very unfortunate to hear. I know when I studied CS in the 90's, the foundation was always based on understanding the underlying hardware. My OS class focused on hardware interrupts, protected mode operation, cache and memory hierarchies. The whole basis for strategies and methods of making fast algorithms depends on knowing how the underlying hardware works.

      How can you call yourself a computer scientist if you don't understand the different fundamental architectures you run on?

    3. Re:GPU Programming Requires a Different Mindset by Anonymous Coward · · Score: 1

      You apparently took Computer Engineer classes re-labled as Computer Science classes.

      Computer Science doesn't deal with interupts and protected mode operation, Computer Science does Science, ie studies in algorithms, a field that is closer to Math than Engineering. Knuuths books, for example, are Computer Science, not Computer Engineering. He doesn't once talk about "interrupts" or "protected mode operation".

    4. Re:GPU Programming Requires a Different Mindset by Anonymous Coward · · Score: 0

      I think the whole point was that they were given horrible source code and they were supposed to turn it into good code.

  61. See what libraries offer by Anonymous Coward · · Score: 0

    Before jumping in, do see what's available. CUDA particularly has a very rich set of libraries and OpenCL offerings might have just what you need as well (image and video processing).

  62. Intel Xeon Phi by chewie2010 · · Score: 1

    Look into Intel Xeon Phi. It is Intels version of an nivedia tesla. It does not require any special language and is made to program in like a normal intel processor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

    1. Re: Intel Xeon Phi by Anonymous Coward · · Score: 0

      If you want to make the most out of a Phi without reinventing the entire parallelism model yourself *have* to use a special language - check out ISPC (Intel SMPD Program Compiler) - a C-like compiler exploiting the Single Program Multiple Data paradigm.

      So GPUs or Phi - either way you have to learn new programming paradigms if you want to get the job done well. No free lunches here.

  63. Make a successful PC game then get a publisher by tepples · · Score: 1

    The replacement is native code.

    I'm certain that experienced developers of mouse-driven games for Windows on PCs can still obtain Xbox One devkits through an accredited disc game publisher. Of course this requires you to conceive, implement, ship, and market a game in a mouse-driven genre to demonstrate your competence. And you'll need certain professional social networking skills, which don't come easily to people with some disabilities that correlate with programming skill, to negotiate with a publisher. But as another Slashdot user has repeated to me over the years: "them's the breaks."

  64. Easy and GPU programming by mpfife · · Score: 1

    Simply don't go in the same sentence. You inherently need to know a lot about the underlying hardware and programming models to take advantage of that hardware - and none of that is easy. Best advice? Maybe use C# and start with a good sample tutorial. After that, you're going to learn a lot more about image algorithms/etc. That's why I can still make amazing amounts of money knowing how to program for GPU's.

  65. This might help by Anonymous Coward · · Score: 0

    http://hackage.haskell.org/package/accelerate

  66. Harnessing GPU vs Learning GPU by Anonymous Coward · · Score: 2, Interesting

    Writing GPU programs is hard. Not only do you have to learn a new sets of APIs, you also have to understand the underlying architecture to extract decent performance. It requires a different approach to problem solving that requires months if not years to develop.

    Fortunately you don't need to read the entire cuda programming guide to program on the GPU. There are several excellent libraries out there which hide the complexities of the GPU architecture. Since you are doing image processing, I would recommend Arrayfire (http://www.accelereyes.com/products/arrayfire). It is a free library which provides several image processing functions which have been optimized for the GPU. You should also look into Thrust and NPP(included with the CUDA toolkit), although these libraries are more verbose and require greater understanding of the C++ and GPU to program.

  67. Relatively Painless = GPU Libraries by melonakos · · Score: 1
    Always awesome to see GPU computing getting Slashdot love!

    Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way?

    You do NOT have to learn CUDA or OpenCL. You can use libraries or compilers. GPU libraries tend to give better performance than GPU compilers (e.g. OpenACC) and tend to be able to handle more algorithms. That is because compilers are simply not smart enough to do things as well as expert programmers who meticulously hand-tune kernels and put them in libraries. Any number of libraries are available. There are many poorly supported libraries out there, so you may have to search around to find good ones. I suggest one below.

    What, currently, is the most painless way to start playing with GPU programming? Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

    My colleagues and I at AccelerEyes have dedicated the last 6 years of our lives to trying to help people find exactly what you're looking for - "a relatively painless" way to harness the GPU. The result is our ArrayFire library for CUDA or OpenCL. I know it's uncool to toot one's horn, but the GPU computing community is small enough that people know each other and we're all working together to build out the ecosystem. There are many different contributions to GPU computing by many different groups. Our group's specialty in the ecosystem has always been the "relatively painless" contribution coupled with great performance. The reason people like our stuff is because we do nothing but work on squeezing out the most performance possible. Then we wrap up those kernels into convenient library calls that can be plugged in like math functions to your code with much less burden than writing the CUDA or OpenCL from scratch.

    Happy to answer any further questions you may have about specific libraries, compilers, or GPU programming approaches. We eat, drink, and breathe everything CUDA/OpenCL.

    BTW, we also encourage learning expert CUDA/OpenCL development. It is tough, no doubt about that. It is time-consuming and for many developers is not worth the added development complexity and lengthened development time. It sounds like you are probably in the boat of not caring about becoming an expert in low-level details, rather just wanting to get better performance to achieve a goal and be done with it. Is that correct?

    Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply?

    Labview does not have good support for GPUs. Many ArrayFire users are building custom Labview blocks so that they can program the GPUs more simply. I can connect you to some of those users if you wish (just shoot me a note to john@accelereyes.com).

    I'm unaware of another graphical box/nodes package that supports GPUs.

    ---

    While I'm at it, I know this post is going to be read by many expert CUDA/OpenCL developers out there. If you're interested in writing CUDA/OpenCL code daily, we're hiring (see my email above) :)

  68. OpenCL not obsolete. OpenACC generates CUDA/OpenCL by keneng · · Score: 2

    OpenACC may be higher-level(easier to use), but it still generates CUDA/OpenCL code. Your wording sounded like "OpenCL support is gone." I want to correct you on that. OpenCL is the future and wraps CUDA also. If you code for CUDA, you can only target CUDA hardware. If you code for OpenCL, you can target not only AMD, but also CUDA hardware. That was the point of the OpenCL spec in the first place. OpenCL can also transparently take advantage of the local CPU cores. OPENCL has one drawback. OpenCL does not support all types. It is highly constrained to certain kinds of types relevant to graphics/3D. There have been some kludge patches to make CUDA/OpenCL work with string types (i.e. parallel grep with CUDA), but these aren't well suited because the hardware was not intended for that and it requires a lot of moving of memory from the main motherboard memory to the graphics card memory which wastes a lot of time. String parallelizing is better done with mechanisms like OpenMP. OpenMP can support any kind of types and crunch with them and OpenMP is designed to co-exist with MPI(RPC-like many computer parallelism).

    Start learning OpenCL, OpenMP, MPI, GNU & boost library parallelism. To make it easier try running golang opencl examples:
    apt-get install mercurial meld
    hg clone -u release https://code.google.com/p/go
    cd go
    cd src ./all.bash
    #Put this into your ~/.bashrc:
    export GOROOT=/home/youruser/yourgo
    export PATH=$PATH:$GOROOT/bin
    mkdir -p ~/goopencl
    cd ~/goopencl
    mkdir -p ~/goopencl/pkg
    mkdir -p ~/goopencl/src
    export GOPATH=/home/youruser/goopencl
    go get github.com/tones111/go-opencl/cl
    go get github.com/tones111/raw
    cd /home/youruser/goopencl/src/github.com/tones111/go-opencl/cl/demo/rotate/
    go run rotate.go -i="i.png" -o="o.png" -a=15

  69. Microsoft AMP by Anonymous Coward · · Score: 0

    I am really surprised that nobody talked about microsoft AMP (not sure how good it is but from some benchmark, it seems to perform really fine ) :
    http://msdn.microsoft.com/en-us/library/vstudio/hh265137.aspx
    It's basically an API allowing programming gpu or what ever in a painless way.
    That or CUDA. Some people have done some .net binding it seems.
      CUDA = 3 lines of c to run a kernel, OpenCL has a much steeper learning curve of the three.

  70. no pants by Anonymous Coward · · Score: 0

    I read the headline as "the most pantsless intro".

  71. Re:XNA or Unity by polymeris · · Score: 1

    <quote>... and it's a language the asker is familiar with.</quote>

    The asker is familiar with HLSL?

  72. OpenCL wrapper to C# by Anonymous Coward · · Score: 0

    At my company, we built a light-weight OpenCL-wrapper for C# to hide most of the setup and code overhead. The OpenCL-code wasn't that hard to do since the algorithm was essentially some mathematics that's easy to implement in C. The most difficult thing was to debug and profile the OpenCL code, but I think there are some quite nice tools for that now.

  73. GL shaders are easy by Ptolemy+Too · · Score: 1

    Actual Open GL shaders are pretty easy to write. They're C-like, and there is only a handful of library functions.

    The complexities of Open GL programming all come in the glSoMany() calls - if you can find a 2D framework that can render quads for you, using shaders you supply, you're home free.

    Since you have literal image processing needs, I think it may make sense to stick to actual, raw GL. Using a more general purpose vector programming language that compiles to GL code, you may have a lot more boilerplate to deal with. My guess is it's the boilerplate that makes CUDA/OpenCL seem daunting.

  74. Re:XNA or Unity by Anonymous Coward · · Score: 0

    I'm still posting AC, because I lost my 5 digit slashdot ID a long time ago, but try out Microsoft Accelerator. It makes all this easy. Very, very easy.

    Google is friend. Google c# GPU and you'll find Accelerator, it's not exactly well-hidden.

  75. lego by Anonymous Coward · · Score: 0

    if you want to connect boxes you are lazy and not suitable for the task. Go play with lego then, you can even build a PC

  76. just learn cuda? by SkunkPussy · · Score: 1

    If you are an intermediate level programmer as you say then you can easily learn to use a new programming paradigm. There is a coursera course https://www.coursera.org/course/hetero which is ok and should do for your purposes.

    --
    SURELY NOT!!!!!
  77. high-level GPU api's and languages by Anonymous Coward · · Score: 0

    There's probably no need to reinvent the wheel. A number of high-level api's are available for this purpose.
    OpenCV does image processing and has GPU support.

    A more general tool is Theano which is a meta-programming tool. You state your computations symbolically and theano generates a computation graph. The graph gets simplified and the theano generates cpu/gpu code for your equations.

    --Beau

  78. Intermediate-level programmer who works C# by Anonymous Coward · · Score: 0

    So in other words, a clueless Micro$oft's bitch.

  79. GPU with .Net by Yakust · · Score: 1

    Easiest way to do GPU on .Net is Cudafy :
    http://cudafy.codeplex.com/
    Allows you to write .Net code that runs on a nVidia GPU
    (an OpenCL version is in the works)

    For more advanced work, CMSoft have a great complete tutorial on using OpenCL with .Net :
    http://www.cmsoft.com.br/?option=com_content&view=category&layout=blog&id=41&Itemid=75

  80. Real Programmers.. by yu.alvinray · · Score: 1
  81. Free Class on Coursera by Anonymous Coward · · Score: 0

    Heterogeneous Parallel Programming Class on Coursera. Free and 5 stars in my book.

  82. Re:XNA or Unity by Anonymous Coward · · Score: 0

    what in the name of holy fuck are you talking about?