Ask Slashdot: What Is the Most Painless Intro To GPU Programming?
dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"
I tried it out once a while ago just to see what it does. It looks 'dead' from a support POV, but it is still out there;
Release notes for MC# 3.0:
a) GPU support both for Windows and Linux,
b) integration with Microsoft Visual Studio 2010,
c) bunch of sample programs for running on GPU (including multi-GPU versions),
d) "GPU programming with MC#" tutorial.
GPU programming is painful. A painless introduction doesn't capture the flavor of it.
Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.
Learn OpenCL and do the job properly.
When all you have is a hammer, every problem starts to look like a thumb.
CUDA is extremely easy to learn and use (if you know C and of course have an NVidia card) and is well worth the effort for some projects. Alternatively you could try skipping GPU programming and just using OpenMP which would still greatly improve performance if your not already multithreading.
Those are game engines. They will do nothing to help him use the GPGPU capabilities of his graphics card.
don't know what the status is on Windows, but for high-performance computing, OpenACC is an emerging standard, with support by Cray and PGI compilers.
I don't think he is looking at making a game, I think he is looking for some cheap parallel processing. I have done some cuda, it was a pain to set up a few years back. There probably are better tutorials now.
The heavy lifting has mostly already been done for you. There are CUDA wrappers out there that, with a few changes to your code, run it as close to optimally as possible using the card's cores. We had a Nvidia guy come by and give a talk just to show off how relatively painless it is (similar to OpenMPI, in my opinion). If you've got a couple extra people around consider reaching out to Nvidia to have someone show everyone a few of the options.
I get the impression that CUDA/OpenCL is still the best option. This thesis on Obsidian presents, a Haskell set of binding which might be easier and also covers the basics quite well. Haskell lends itself really well because the language inherently is designed for parallelism because of purity and out of order computation. That being said, I think Obsidian is a bit rough around the edges but if you are looking for a real alternative, this is one.
It new and might be a little rough around the edges, but everything else is hacks on top of OEM property "solutions" on top of hardware hacks.
XNA has easy, painless shader compilation. You can plug a C# image class into an XNA texture, pipe it through a vshs shader that you write by hand, and dump the output to a texture, back to an image. That process is highly interoperable with existing C# applications.
But that ignores the fact that Microsoft abandoned XNA like an unwanted child.
Check out Max/MSP/Jitter.
As you describe, the interface is VPL - connecting boxes / nodes to access the GPU is one of the (many) things the program is capable of. Depending on what you're trying to, you may also find Gen useful for generating GLSL shaders within the Max environment (although you can use other shaders as well).
I'm currently neck-deep in a few Jitter projects using custom shaders, etc., and while it's great for rapid prototyping, getting good frame-rates and production stable code out is a whole black art unto itself. Fortunately, the support and forum community are very strong.
Anyone who tells you differently is selling you something.
Check out the Udacity class on parallel programming. It's mostly CUDA (I believe it's taught by NVIDIA engineers): https://www.udacity.com/course/cs344
CUDA is generally easier to program than OpenCL. Of course, CUDA only runs on NVIDIA GPUs though.
It is Microsoft, but have you looked at C++ AMP?
http://en.wikipedia.org/wiki/C%2B%2B_AMP
OpenACC is what you're looking for. It uses a directive based programming model similar to OpenMP, so you write ordinary looking code, then annotate it in ways that tell the compiler how to transform it into GPU code.
You won't get as good performance as well written CUDA or OpenCL code, but it's much easier to learn. And once you get comfortable with it, you may find it easier to make the step from there into lower level programming.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
VB.NET background. Wanted to get into GPGPU to accelerate some of my more complicated math calculations. Tried CLOO (open source .net GPU wrappers) and couldn't get it to work, tried AMD's OPENCL dev gui, couldn't get it to work. Eventually found the answer in python. GPGPU in pyopencl is well-documented thanks to the bitcoiners, and from .net you can either run the python in a shell, or write a little python kernel to listen for, and process commands. Only catch is the opencl abilities are limited, and you have to start dabbling in c++ to get it to do any real work (and even then it's a dumbed-down c++ and many existing extensions don't install or work quite right). All in all I found the entire thing very rewarding though. :) Best of luck.
Learn about parallel programming with OpenMP, which you can run on your normal machine. If you take enough time to do that properly then the OpenMP standard will also support GPUs, and the move to such architectures will be easy.
Heterogeneous parallel programming. It cuts it. In a few lessons you will know where you are heading.
Like in all attemps at getting stuff faster, you should first wonder what kind of performance you are already getting out of CPU implementation. Provided you seem to believe it is actually possible to get performance out of a VB like langage, I assume that your base implementation heavily sucks.
Putting stuff on a GPU has for only goal to make things faster but it is mostly difficult to write and non portable. Having a good CPU implementation might just be what you need. It also might be easier for you to write.
If you really need a GPU, then you need to start learning how GPU works, because a simple copy paste is unlikely to give you any significant performance. A good start at: https://developer.nvidia.com/cuda-education-training
I never properly learned opencl, but it is essentially similar. Except you have access to less low level details on nvidia architecture. Of course, cuda is pretty much nvidia only.
Take a look at C++ AMP. It is a small language extension that lets you target the GPU using C++. The platform takes care of most of the mechanics of running code on the GPU. Also check out this blog post for links to tutorials and samples.
barraCUDA because that'll eat your motherfucking ass alive man!
Coursera has some courses on GPU programming, like this one, and what's nice about them pretty slow, and I'm assuming that they explain things well. Other online courses probably offer the same, and I think the video lectures would be helpful in understanding the concepts.
I have left slashdot and am now on Soylent News. FUCK YOU DICE.
LOTS of it.
Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.
So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.
Udacity currently has a fairly decent intro course on GPU programming at: https://www.udacity.com/course/cs344
It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.
I'd suggest running through that and then deciding on what API you want to end up using.
Consider the Intel image processing libraries. They have a broad range of routines that are highly optimized for their processors.
If you know multithreading concepts, OpenCL isn't too hard to get into.
Ofcourse, start small, do tutorials, and do it right.
Much much much easier than trying to do stuff in pixel shader, or ,even worse, the assembly like shading language that came before GLSL.
DirectCompute into
Some good samples on the MSDN forums
If you are going to program a GPU, and you are looking for performance gains, you MUST understand the hardware. In particular, you must understand the complicated memory architecture, you must understand the mechanisms for moving data from one memory system to another, and you must understand how your application and algorithm can be transformed into that model.
There is no shortcut. There is no magic. There is only hardware.
If you do not believe me, you can hunt up the various Nvidia papers walking you through (in painful detail-- link below) the process of writing a simple matrix transpose operation for CUDA. The difference between a naive and a good implementation, as shown in that paper, is huge.
That said, once you understand the principles, CUDA is relatively easy to learn as an extension of C, and the Nvidia profiler, NVVP, is good at identifying some of the pitfalls for you so that you can fix them.
http://www.cs.colostate.edu/~cs675/MatrixTranspose.pdf
I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.
There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.
I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.
BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.
I've heard decent things about CUDAfy.NET.
The only painful thing you have to do is to decide how to increase threading in your code.
You would probably see a multi-fold increase in performance by simply converting your project from C# to C++.
Closest to painless I know of is https://bitbucket.org/bradjcox/gpu-maven-plugin
The GPU Maven Plugin compiles Java code with hand-selected Java kernels to CUDA that can run on NVIDIA GPUs of compatibility level 2.0 or higher. It encapsulates the build process so that GPU code is as easy to build with maven as ordinary Java code. The plugin relies on the NVidia CUDA SDK being installed which must be done separately.
Use c# and Microsoft Accelerator.
It's very easy to use, and since the VAST majority of your processing is going to occur on the GPU, the language you use is mostly irrelevant.
The main thing you need to be aware of is that the bus to the video card is very, very, very slow. So in order to get any speedup from the GPU, you'll need to send as much stuff to be processed to the video card as you can. Round-trips hurt you a lot, so minimize them any way you can get away with doing so.
I went with OpenSceneGraph.
Long ago, I tried xlib only, because at that time Motif was the only higher layer available, and it was proprietary. It was horrible. xlib has been superceded by XCB, but I wouldn't use that, not with all the other options out there today. XCB is a very low level graphics library, for drawing lines and letters in 2D. 3D graphics can be done with that, but your code would have to have all the math to transform 3D representations in your data into 2D window coordinates for XCB. LessTif is a free replacement for Motif, but by the time it was complete enough to be usable, the world was already moving on. With Wayland likely pushing X aside in the near future, XCB and xlib may not perform so well. They will continue to be supported for a while through a compatibility layer, but I think they're on the way out. Motif is also not much good these days either. For one, Motif rests on top of xlib, and if xlib goes, so does Motif. Today, we have many better libraries for interfacing with GUIs.
When OpenGL became available, I tried it. OpenGL is great for drawing simple 3D graphics, but it lacks intelligence. The easy part is that you just pass x,y,z coordinates to the library routines, and OpenGL does the rest. The bad part is that if you want to draw a fairly complicated scene, containing many objects that may be partly or completely hidden behind other objects, OpenGL has no intelligence to deal with that. It just dumbly draws everything your code tells it to draw. To speed that up, your code has to have the smarts to figure out what not to draw, so it can skip calling on OpenGL for invisible objects.
That's where a library like OpenSceneGraph comes in. Your code feeds all the info to OSG. OSG figures out visibility, then calls OpenGL accordingly.
You may need still other libraries for window management, something like FLTK. Yes, FLTK and OSG can work together.
You will also most likely be working in C/C++. OpenGL has many language bindings. But OSG is C++ and doesn't have so many. FLTK is also C++, and has even fewer bindings. Trouble with picking a language like Python for this work is that it can be difficult to find bindings for all the libraries. Even when bindings to a particular language exist, they tend to be incomplete, and don't always perfectly work around differences in data representation. Pick libraries first, then see what language bindings they all have in common, then code in one of those common languages. It's possible C/C++ will turn out to be the only language common to all the libraries.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
You could give Theano a try. It's a python based symbolic expression compiler which interface is very much like numpy. I use it on Linux but I've heard mention of support for Windows.
http://deeplearning.net/software/theano/
Incorrect. That is certainly a valid approach and the GP should be modded up.
Using textures and shaders you can very easily do massively parallel floating point operations in XNA on the GPU, and it's a language the asker is familiar with.
Think outside the box a little bit.
I admit I don't know much about GPU programming.
But if I were you, I'd take a good look at the rootbeer compiler, which translates Java code into CUDA or OpenCL
http://rbcompiler.com/
https://github.com/pcpratts/rootbeer1
It sure looks simple and Java is just a small step from C#.
Look at MIT's Halide it's a domain specific language for image processing. http://halide-lang.org/
The alternative is OpenCL/CUDA, which require in-depth knowledge of the H/W to get the best from the GPU. It doesn't matter whether you use Python or whatever bindings you choose for a GPU native language. The hardest part is mapping the algorithm to the H/W model of a GPU. PyCUDA does NOT solve that issue.
You can get plenty of help from Stackoverflow.
I wouldn't call her advanced coursework easy, but a resource that belongs on this thread: http://www.cs.utah.edu/~mhall/cs6963s09/
Mary Hall is a professor of Computer Science. Her recent work is related to compilers and parallel programming on GPUs. Her professional web page is something like an on-line open course, or the framework of one.
There isn't really a painless way. Like a lot of skills in life, the only way to learn is through pain, suffering and frustration. But it makes the prize all the much more enjoyable. You need to be experienced at regular, serial programming in C/C++, then mangle all of it to figure out how to program in parallel. I literally read the CUDA programming's guide 5 times. And I felt like I gained as much on the fifth time as I did the first time. And don't expect your debugger to save you -- if it's like it was a year ago, you're going to struggle a bit with that.
Luckily, once you do get it, it all seems to make sense in hindsight. And when you do achieve that 10x-300x speedup, you'll feel like a superhero. You just have to be patient and expect some frustration. It's not like learning a new programming language. It's like a whole new programming paradigm.
ms has a habit of abandoning one product and then other guys in the same fucking company forcing you to use xna.* libs on their brand spanking new hardware.
but actually that sounds like a possible solution for the guy, the pain being writing the shader.
silverlight abandoned? what the fuck are you doing shipping sdk with silverlight libs on almost the same fucking day?! I see though where elop learnt his trade.
world was created 5 seconds before this post as it is.
Yeah, I know the feeling.
It would be one more tool under my belt. For instance, most non-financial people hear of unemployment numbers and a few know where to view the official data. For some bizarre reason the government offers no graphs at dol.gov alongside their statistics, even though they let you download years worth of raw data. Enter us geeks, who easily put together a spreadsheet to make sense of official unemployment trends and zoom into the data all we want and run our won analysis.
One day knowing Opencl might let me to do similar processing that would otherwise be out of my reach. The potential alone has merit. Executing basic parallel programming without fear will yield a better accomplishment than the last multi-day experiment I ran on my GPU: mining up to one bit cent.
... to code it in COBOL for you.
They can take my LifeAlert pendant when they pry it from my cold dead fingers.
I recommend CUDA if you can deploy requiring NVIDIA hardware. CUDA allows for pre-compiled kernels, CUDA has a debugger for your kernels, CUDA has a tool chain. CUDA has far richer options. Indeed, NVIDIA uses LLVM for it's CUDA compilers so in theory different programming languages can be used to write CUDA kernels. Take a gander at: https://developer.nvidia.com/cuda-llvm-compiler
In contrast, OpenCL is somewhat barbaric. It is an API and there are very few tools for it. Worse, OpenCL implementation can be all over the map.
You do NOT need to use or for that matter use OpenGL to use CUDA or OpenCL. The interop APIs between OpenGL and OpenCL or CUDA are to make buffer transfers efficient between the two (so that one can compute something with CUDA or OpenCL and have it drawn with OpenGL).
was going to suggest openGL or DirectX but i think the poster wanted a general programming language.
not sure if this is helpful but i found a website about CUDA for video cards at: http://docs.nvidia.com/cuda/index.html
i don't think CUDA programs will work on my AMD video card though. lol it'll be cool if i could create a program that uses the 800MHz GPU and DDR3 VRAM just for fun.
You could download a 30 day free demo of Wolfram Mathematica and play with its GPU support. They have done a good job of automating a big part of the complex GPU programming process. http://www.wolfram.com/products/cuda-opencl-programming-mathematica.html
Even GLSL or HLSL are fine for an introduction to GPU processing. You won't be doing GPU bitcoin mining or any serious data tasks with it in the end, but it's fine for spreading out some of the work from your CPU.
OpenCL or CUDA is a real pain, and a lot to learn. But any modern Intel quad core processor can deliver 50 billion floating point operations per second if you treat it right.
Use C or C++ with the Clang compiler (gcc will do fine as well probably) and vector extensions. Newer Intel processors have 256 bit vector registers, so you can define vector types with 32 8-bit integers, 16 16-bit integers, 8 32-bit integers or 8 single precision floating point numbers, or 4 double precision floating point numbers. You can do two operations with such vectors per cycle if you take care about latency. And on a more expensive processor, you can run 8 threads in parallel.
If 50 billion floating point operations per second is enough, then you're fine. And if you can't manage to produce 50 billion FLOPS/sec in C or C++, then you don't even need to try OpenCL.
>I am an intermediate-level programmer who works mostly in C# NET.
I am so very, very sorry. I hope you find a better job soon.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I've just started with opencl and love it, it's fast, easy, debuggable (codel) and -with stable drivers- not too much of a pain when it goes wrong.
I've been writing hlsl, glsl and arb vertex shaders for years and to me, opencl kernels are basically the same thing (language and limitation wise). Convert some full screen graphics effects to opencl for a first example, then make it do other stuff (maybe with buffers instead of images).
Once you're used to making/debugging kernels, start splitting code/algorithms into smaller chunks, and start parallelising!
Once it works, start digging into specific opencl/cuda stuff (local vs global memory etc) to start optimising
Check out nVidia's Thrust (https://developer.nvidia.com/thrust). It uses STL-like containers and algorithms to allow you to do many common GPU operations quite easily from a C++ environment. I've implemented entire image processing algorithms using Thurst. They also have fairly good documentation and examples (http://docs.nvidia.com/cuda/thrust/).
The easiest on-ramp to speeding up image/video processing is probably the npp library https://developer.nvidia.com/npp [nvidia.com] It has functionality and syntax similar to Intel's ipp library but uses an NVIDIA cuda-capable GPU to accelerate the operations.
If you want to dig in deeper you could explore OpenACC http://www.openacc-standard.org/ [openacc-standard.org] OpenACC is a directives based approach to accelerator programming. You comment or mark up your code with OpenACC directives that provide additional information that the compiler can use to generate parallel code.
Finally, you can learn CUDA C, or OpenCL, or CUDA Fortran, or NumbaPro, or one of the other programming languages that are supported on the GPU hardware of your choice. NVIDIA's CUDA C compiler is based on LLVM and the IR changes have been upstreamed to LLVM.org, There are several languages and projects in development that are leveraging the LLVM infrastructure to add GPU/parallel support.
[disclaimer: I work for NVIDIA, but the words above are my own.]
...some more ideas here:
http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-99.html
How many boxes do you want to go through before you get to the solution? Sure, he could write it as a shader, but that hardly requires pulling in something like Unity or XNA to build the project.
I was in the same boat, I have an image processing algorithm that can take up to 10 seconds on an older mid-range CPU, its for the processing of product photos into high quality "perfect" production ready photos. I am also a C# programmer, and when looking into options I came across CUDAfy.net. it lets you code in C# and uses ILSpy to take your compiled C# and turn in into CUDA C which is then compiled. This is then cached so production machines only need to include the cache. I just spent all day today recoding my algorithm and while I found it a little complicated to get started (mostly since I didn't understand how threads and "blocks" work initially, I got my algorithm ported in a day (well the main part, some of the little cleanup, probably another day or two to be 100% ported). I think that's pretty dang good especially since my original algorithm was not even run in parallel. Also I timed it and its taking 0.3 seconds, so that's about a 33X speedup so far, I figure the remaining code will bring that down to about 20X. I'm using a GTX 650 TI Boost card which cost under $200. CUDAfy.net can also work with OpenCL though I haven't tested that aspect out yet. Overall if you want the most painless shift from C# to GPU coding I would recommend checking out CUDAfy.net Its free and licensed under LGPL so you can use it in commercial code.
Can't ignore something when it has been: 1. Discontinued 2. No longer running on the next big version of Windows (9). 3. There is no replacement, thus Microsoft does not want ANYONE to develop for windows/xbox.
I took some parallel processing classes in the last couple of years as part of my Master's program. CUDA was one of those tricky little beasts that basically takes a few minutes to learn (assuming a rock solid C/C++ background) but a lifetime to master the nuances.
We were building little throw-away matrix multiply programs - for which we were given horribly inefficient and barely functional source to start with. The challenge was to make it run as fast as possible, with extra credit going to the fastest implementation. It turns out to accomplish this you basically need to understand every tier of the memory architecture of CUDA, the process by which it reads in cache lines to avoid collisions, how to optimize the read/write patterns, how the job would be split up among the GPU's (and the parameters used for the splitting), and basically every nit-picking detail of how the hardware actually runs.
This runs counter to the level of abstraction that most CS majors are used to dealing with - if we wanted to do hardware we would've gone the EE or CE route - but if you want to truly want to grok CUDA, you have to become a hardware wiz. Otherwise you'll always be stuck wondering why you can never seem to get the level of speedup that the benchmarks suggest should be possible.
Before jumping in, do see what's available. CUDA particularly has a very rich set of libraries and OpenCL offerings might have just what you need as well (image and video processing).
Look into Intel Xeon Phi. It is Intels version of an nivedia tesla. It does not require any special language and is made to program in like a normal intel processor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html
The replacement is native code.
I'm certain that experienced developers of mouse-driven games for Windows on PCs can still obtain Xbox One devkits through an accredited disc game publisher. Of course this requires you to conceive, implement, ship, and market a game in a mouse-driven genre to demonstrate your competence. And you'll need certain professional social networking skills, which don't come easily to people with some disabilities that correlate with programming skill, to negotiate with a publisher. But as another Slashdot user has repeated to me over the years: "them's the breaks."
Simply don't go in the same sentence. You inherently need to know a lot about the underlying hardware and programming models to take advantage of that hardware - and none of that is easy. Best advice? Maybe use C# and start with a good sample tutorial. After that, you're going to learn a lot more about image algorithms/etc. That's why I can still make amazing amounts of money knowing how to program for GPU's.
http://hackage.haskell.org/package/accelerate
Writing GPU programs is hard. Not only do you have to learn a new sets of APIs, you also have to understand the underlying architecture to extract decent performance. It requires a different approach to problem solving that requires months if not years to develop.
Fortunately you don't need to read the entire cuda programming guide to program on the GPU. There are several excellent libraries out there which hide the complexities of the GPU architecture. Since you are doing image processing, I would recommend Arrayfire (http://www.accelereyes.com/products/arrayfire). It is a free library which provides several image processing functions which have been optimized for the GPU. You should also look into Thrust and NPP(included with the CUDA toolkit), although these libraries are more verbose and require greater understanding of the C++ and GPU to program.
Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way?
You do NOT have to learn CUDA or OpenCL. You can use libraries or compilers. GPU libraries tend to give better performance than GPU compilers (e.g. OpenACC) and tend to be able to handle more algorithms. That is because compilers are simply not smart enough to do things as well as expert programmers who meticulously hand-tune kernels and put them in libraries. Any number of libraries are available. There are many poorly supported libraries out there, so you may have to search around to find good ones. I suggest one below.
What, currently, is the most painless way to start playing with GPU programming? Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"
My colleagues and I at AccelerEyes have dedicated the last 6 years of our lives to trying to help people find exactly what you're looking for - "a relatively painless" way to harness the GPU. The result is our ArrayFire library for CUDA or OpenCL. I know it's uncool to toot one's horn, but the GPU computing community is small enough that people know each other and we're all working together to build out the ecosystem. There are many different contributions to GPU computing by many different groups. Our group's specialty in the ecosystem has always been the "relatively painless" contribution coupled with great performance. The reason people like our stuff is because we do nothing but work on squeezing out the most performance possible. Then we wrap up those kernels into convenient library calls that can be plugged in like math functions to your code with much less burden than writing the CUDA or OpenCL from scratch.
Happy to answer any further questions you may have about specific libraries, compilers, or GPU programming approaches. We eat, drink, and breathe everything CUDA/OpenCL.
BTW, we also encourage learning expert CUDA/OpenCL development. It is tough, no doubt about that. It is time-consuming and for many developers is not worth the added development complexity and lengthened development time. It sounds like you are probably in the boat of not caring about becoming an expert in low-level details, rather just wanting to get better performance to achieve a goal and be done with it. Is that correct?
Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply?
Labview does not have good support for GPUs. Many ArrayFire users are building custom Labview blocks so that they can program the GPUs more simply. I can connect you to some of those users if you wish (just shoot me a note to john@accelereyes.com).
I'm unaware of another graphical box/nodes package that supports GPUs.
---
While I'm at it, I know this post is going to be read by many expert CUDA/OpenCL developers out there. If you're interested in writing CUDA/OpenCL code daily, we're hiring (see my email above) :)
OpenACC may be higher-level(easier to use), but it still generates CUDA/OpenCL code. Your wording sounded like "OpenCL support is gone." I want to correct you on that. OpenCL is the future and wraps CUDA also. If you code for CUDA, you can only target CUDA hardware. If you code for OpenCL, you can target not only AMD, but also CUDA hardware. That was the point of the OpenCL spec in the first place. OpenCL can also transparently take advantage of the local CPU cores. OPENCL has one drawback. OpenCL does not support all types. It is highly constrained to certain kinds of types relevant to graphics/3D. There have been some kludge patches to make CUDA/OpenCL work with string types (i.e. parallel grep with CUDA), but these aren't well suited because the hardware was not intended for that and it requires a lot of moving of memory from the main motherboard memory to the graphics card memory which wastes a lot of time. String parallelizing is better done with mechanisms like OpenMP. OpenMP can support any kind of types and crunch with them and OpenMP is designed to co-exist with MPI(RPC-like many computer parallelism).
Start learning OpenCL, OpenMP, MPI, GNU & boost library parallelism. To make it easier try running golang opencl examples: ./all.bash /home/youruser/goopencl/src/github.com/tones111/go-opencl/cl/demo/rotate/
apt-get install mercurial meld
hg clone -u release https://code.google.com/p/go
cd go
cd src
#Put this into your ~/.bashrc:
export GOROOT=/home/youruser/yourgo
export PATH=$PATH:$GOROOT/bin
mkdir -p ~/goopencl
cd ~/goopencl
mkdir -p ~/goopencl/pkg
mkdir -p ~/goopencl/src
export GOPATH=/home/youruser/goopencl
go get github.com/tones111/go-opencl/cl
go get github.com/tones111/raw
cd
go run rotate.go -i="i.png" -o="o.png" -a=15
I am really surprised that nobody talked about microsoft AMP (not sure how good it is but from some benchmark, it seems to perform really fine ) : .net binding it seems.
http://msdn.microsoft.com/en-us/library/vstudio/hh265137.aspx
It's basically an API allowing programming gpu or what ever in a painless way.
That or CUDA. Some people have done some
CUDA = 3 lines of c to run a kernel, OpenCL has a much steeper learning curve of the three.
I read the headline as "the most pantsless intro".
<quote>... and it's a language the asker is familiar with.</quote>
The asker is familiar with HLSL?
At my company, we built a light-weight OpenCL-wrapper for C# to hide most of the setup and code overhead. The OpenCL-code wasn't that hard to do since the algorithm was essentially some mathematics that's easy to implement in C. The most difficult thing was to debug and profile the OpenCL code, but I think there are some quite nice tools for that now.
Actual Open GL shaders are pretty easy to write. They're C-like, and there is only a handful of library functions.
The complexities of Open GL programming all come in the glSoMany() calls - if you can find a 2D framework that can render quads for you, using shaders you supply, you're home free.
Since you have literal image processing needs, I think it may make sense to stick to actual, raw GL. Using a more general purpose vector programming language that compiles to GL code, you may have a lot more boilerplate to deal with. My guess is it's the boilerplate that makes CUDA/OpenCL seem daunting.
I'm still posting AC, because I lost my 5 digit slashdot ID a long time ago, but try out Microsoft Accelerator. It makes all this easy. Very, very easy.
Google is friend. Google c# GPU and you'll find Accelerator, it's not exactly well-hidden.
if you want to connect boxes you are lazy and not suitable for the task. Go play with lego then, you can even build a PC
If you are an intermediate level programmer as you say then you can easily learn to use a new programming paradigm. There is a coursera course https://www.coursera.org/course/hetero which is ok and should do for your purposes.
SURELY NOT!!!!!
There's probably no need to reinvent the wheel. A number of high-level api's are available for this purpose.
OpenCV does image processing and has GPU support.
A more general tool is Theano which is a meta-programming tool. You state your computations symbolically and theano generates a computation graph. The graph gets simplified and the theano generates cpu/gpu code for your equations.
--Beau
So in other words, a clueless Micro$oft's bitch.
Easiest way to do GPU on .Net is Cudafy : .Net code that runs on a nVidia GPU
.Net :
http://cudafy.codeplex.com/
Allows you to write
(an OpenCL version is in the works)
For more advanced work, CMSoft have a great complete tutorial on using OpenCL with
http://www.cmsoft.com.br/?option=com_content&view=category&layout=blog&id=41&Itemid=75
Use butterflies.
Heterogeneous Parallel Programming Class on Coursera. Free and 5 stars in my book.
http://managedcuda.codeplex.com/
what in the name of holy fuck are you talking about?