Ask Slashdot: GPU of Choice For OpenCL On Linux?
Bram Stolk writes So, I am running GNU/Linux on a modern Haswell CPU, with an old Radeon HD5xxx from 2009. I'm pretty happy with the open source Gallium driver for 3D acceleration. But now I want to do some GPGPU development using OpenCL on this box, and the old GPU will no longer cut it. What do my fellow technophiles from Slashdot recommend as a replacement GPU? Go NVIDIA, go AMD, or just use the integrated Intel GPU instead? Bonus points for open sourced solutions. Performance not really important, but OpenCL driver maturity is.
AMD with the proprietary drivers is the OpenCL of choice for buttcoin miners.
Using the binary driver has been fine for me.
Not much more to say on the matter. ffmpeg + x264 make use of it nicely.
They're too busy with CUDA to give two shits about decent OpenCL performance.
That's why the HD Radeon series was the mining GPU of choice for Bitcoin.
Intel is your best bet for a mature open sourced opencl compatible GPU, if performance doesn't matter that is..
The future of GPU's is open standards. GPU's won't take off until all major vendors support the latest (OpenCL 2.0) standards
Here is the list of conformant products
https://www.khronos.org/conformance/adopters/conformant-products#opencl
I would go with an nVidia consumer card. They may be more expensive than the AMD ones. On the other hand, they offer CUDA and OpenCL support and are much faster.
For the newer ones (GTX9xxx) you will need to wait a little bit until the driver shipped with CUDA actually supports the cards though.
Nvidia only supports OpenCL as an aftertought, prefering as always to offer up their proprietary CUDA shit instead. So go for an AMD card.
At least at the current time it's not very complicated:
For game, CAD, visualization, and other 3D type stuff then nVidia all the way. They are much less buggy than AMD/ATI in OpenGL and whatever Microsoft uses these days.
For GPU computation then AMD/ATI has the performance advantage and OpenCL advantage (ie. OpenCL 2.0 compatible which there is no nVidia here). Although the very latest nVidia 900 stuff is catching up in performance, they still don't support OpenCL 2.0 AFAIK.
So if you want to do 3D and visual stuff then go nVidia. If you just want to compute stuff in the background then AMD/ATI.
I find old Intel 4xxxx chips slow for 3D graphics. I can't say anything about the newer integrated video chips though. Good luck trying to find a compatible video card. Any reason why you can't buy a cheap Windows computer with a 750 watt power supply and put in a $400 video card?
So, why did you answer?
if performance is not important, e.g. you only want to play with openCL for learning purposes. You can get some more speed with the Intel HD graphics, especially in the recent CPU's, or similarly with AMD APU's. Add-on cards only get interesting when you need higher performance than that.
Anyway, a comment is not always an answer.
I work in a lab that does CT image reconstruction (all gpgpu computing) as part of what we do. I've been the one to program it using OpenCL under Ubuntu (I insisted I use linux; windows was too infuriating) so I'll share my experience.
I have two Nvidia 780 GPUs in my machine (an Alienware Aurora R4) and getting everything running under linux was actually much smoother than my initial attempt to get OpenCL running under Windows 8, so I don't think you'll have too much trouble there. I use the binary blob from Nvidia and it has been pretty stable with the occasional driver crash for whatever reason (maybe once in a six month period, but things just restart and it's fine. It's usually my fault for writing shitty code). I personally really like this setup and the only thing that could make it better would be more GPUs and a great, solid open source driver.
I would say that if you're going to use Nvidia GPUs for GPGPU computing, consider learning CUDA. Syntactically it's very similar to OpenCL but the tools you have access to for debugging, profiling, and increasing performance as well as the overall stability of the programs seems to be much much better. I suppose we should expect that though from a proprietary language, on proprietary hardware, using a proprietary driver. I've heard that you can get better performance (read: speedups) using CUDA over OpenCL, but I've never tested that for myself, or seen proof firsthand.
I've learned OpenCL, and I like it's portability and openness, but I look at some of the stuff my friends can do with CUDA and I can't say that I'm not envious. Mainly what I'm referring to is Nvidia's NSight program, which can do OpenCL if you're willing to pay for the "pro" edition. Also, Nvidia GPUs are scalar based, so if much of you speedup would come from using OpenCL's vector structures, that won't happen on Nvidia GPUs the same way that it would on AMD. Programming might be more convenient, but performance will stay the same.
Hope that helps. Feel free to ask more questions.
Integrated graphics in your CPU will have a modest performance but stable and open source OpenCL driver. If it proves too slow for your particular project, you will be able to compare benchmarks and get the cheapest card that is fast enough to, say, run your animation at 60fps. If you are planning to distribute your code, you will anyway need several GPUs to test with.
I'll bet your a smash hit at parties.
All right, with the disclaimer that I haven't touched this in a year, I was doing OpenCL dev with a 5850 and/or 6870 in linux, and I was pretty happy with it. I'll probably never go back to Nvidia now. I think, and I might be out of date here, that AMD is a bigger bang for the buck when it comes to openCL. And I don't remember any serious problems with the AMD drivers. Nvdidia was a real poor choice for bitcoin mining, which is why I went with the AMD, but I also did so at the time because I knew I was going to be doing a lot of openCL dev and didn't need crazy double floating point performance per se. I think the balance might have shifted a bit now, but since I was more interested in integer/etc. operations per second than double flops, it was a no-brainer.
So I'm guessing you 5xxx is not a 58xx, otherwise you wouldn't have asked this question. I say AMD all the way, unless you absolutely need double performance above all else and Nvidia is still king of that... or there's some other dimension I'm not thinking of right now. But still, I think my newer AMD card has a better double ratio than the previous, so AMD might have caught up by now. I haven't shopped for these in years.
My experience with AMD's Linux OpenCL drivers has been that they're very fast when they work, but that they're very buggy. They also tend to be things that are not that obvious and are difficult to work around, like data corruption when using 3-element types, mis-compilation of code, random X server hangs and so on. I pretty much expect any large code-base that hasn't been tested on AMD before to hit some driver bug. And when it does break, it frequently doesn't die cleanly, but rather freezes up the X server and creates unkillable CPU-hogging processes.
The NVIDIA OpenCL drivers are much, much more solid. I can only recall finding one bug in them over 3-4 years (a thread safety issue with subbuffers). However, it's the unloved sibling of CUDA and they don't put any work into new features - the drivers are still at OpenCL 1.1, they implement a very minimal set of extensions, and newer hardware features (such as intra-warp shuffle and floating-point atomics) are not accessible. They've also taken out support for OpenCL from their development tools (profiler etc), and it seems they only have a 32-bit GPU address space (it's impossible to allocate more than 4GB even on a K40).
My current approach is to write code that targets both OpenCL and CUDA, using a bunch of macros and wrappers and an abstraction layer on top of PyCUDA and PyOpenCL, so that we can run essentially the same code on NVIDIA via CUDA or AMD via OpenCL. Surprisingly, OpenCL on NVIDIA is fractionally faster than CUDA (maybe due to the 32-bit address space). The CUDA path is by far the easiest to debug and optimise: the NVIDIA profiler is generations ahead of AMD's in terms of providing insight into bottlenecks.
Just curious. No flame bait. What do you need the 3D for? I have 3 screens and don't do anything 3D related. I just have the cheapest cards that I can find. I use NVidea as I like the nvidia-settings sofware.
I use three (differnt) cards. Each connected to a differnt monitor running in standard mode (no xinerame, no mirrors). Cheaper to have three cheap cards then one that could do all three easily. And by cheap I mean the cheapest that is available and for the connection that I need. Under 50EUR for 3 cards.
So I am curious what you need the 3D for.
Don't fight for your country, if your country does not fight for you.
There are pros and cons to all architectures.
Last time I used OpenCL on AMD, it had assorted little bugs and quirks, sometimes resulting in horrifyingly inefficient low-level code, and I had to spend much time tweaking my source to extract decent performance. Integer performance in particular was pretty bad, and AMD developers were pretty much outright saying that it was not their high priority area.
The biggest strength of AMD is double precision floating point performance. NVIDIA's gaming cards are crippled in terms of DP performance, and you need a Titan to get around that. (AMD has a couple of cards in the sub $300 range which approach 1 teraflop of theoretical DP throughput. The best NVIDIA can offer for less than $1000 is 0.2 teraflop.)
My personal all-around favorite is NVIDIA CUDA, it's old, stable, easy to use, and all-around fast (though, as mentioned by other commenters, their OpenCL may be their red headed step child.)
Intel's OpenCL for the integrated GPU is worth looking into as well, as it may be superior for many tasks. While it lacks the raw horsepower of gaming chips, it has more sophisticated execution units and a major advantage is that you're working directly with the L3 cache. With AMD or NVIDIA, you have to exchange data between the system memory and the graphics card via PCIe bus, which is not particularly fast, even in the 3.0 incarnation - we're talking somewhere on the order of 10 GB/s, or 1/20'th of the speed with which the GPU can access its own onboard memory. It's not uncommon to have tasks where sending results back from the GPU takes longer than the actual computation. (Sometimes you can overlap compute and data transfer, sometimes you can't, and it's a hassle to do anyway.)
Don't know about pure OpenCL. But... Intel, has over all well support and not too bad compared to windows counterpart drivers. It lacks OpenGL 4 features. It only uses OpenGL3. AMD has open source drivers, which mostly suck. Performance in games is rather bad. Same as synthetic tests. Many people report about rather good bitcoin performance. Proprietary drivers are a bit better on performance but worse on stability. Nvidia, has 2 types of drivers. Reverse engineered ones, which suck and blow when it comes to performance and binary ones. Nvidia with binary drivers has highest performance + stability in games. It's on pair with AMD on bitcoin stuff. In reality no one can say what is good for OpenCL in Linux, because there are no OpenCL apps in Linux. I am no Nvidia fanboy, but i am Nvidia user for a while. They are safest bet on quality/performance, but AMD may as well worth your while, especially if you are dev and don't mind reporting issue to kernel devs, etc.
I recommend the ocl-icd package to make it easy to switch OpenCL implementations on the fly. Also, download the Intel and AMD OpenCL runtimes which support CPU-based computation using SIMD instructions and multicore parallelism, and try them out as well as GPUs. You can then micro-benchmark your own algorithms on different vendor runtimes quite easily. I have found that the Intel OpenCL does a very decent job of auto-vectorization, so my scalar-based OpenCL code ran almost as fast as my hand-vectorized version that uses OpenCL vector intrinsics.
In my case, my image processing algorithms are more memory-bound and a recent 2.4 GHz mobile Intel quad-core outperforms my desktop NVIDIA GTX 760 on the same OpenCL code. Both of these trounce my c. 2010 Xeon E5530. I had no idea how much Intel SIMD performance has improved until I tried this and saw for myself. I think a big advantage is that the CPU doesn't have to transfer the large N-dimensional arrays back and forth over the PCIe bus, but can just get to computing immediately. This may not hold true for some algorithms that crank much longer on a small input or output array.
It is also important to realize that OpenCL parallelism won't save you from poor algorithm choices. You need to be open to experimentation and reevaluating your assumptions as you explore new problems. I work with Python, Numpy, and PyOpenCL so that I can focus on the math first, and then selectively replace the underlying algorithms with different implementations as needed. Being able to work at a high level of abstraction makes it so much easier to explore the math you want to perform, without writing a lot of low-level code that gets thrown away.
I have used 2 AMD cards programming OpenCl on Linux, a HD4650 and a HD7770. My 4650 card was obsoleted by the AMD proprietary drivers in 18 months, my HD7770 is being obsoleted (for new Linux and OpenCL support) by AMD as I write this, after about 2 years. This means if I want to keep doing OpenCl development, I have to use the old driver and old kernels, old xservers, and current version of OpenCl, etc.
I don't think think I will buy AMD again for this reason. Nvidia doesn't obsolete their cards anywhere near as quickly on Linux. If you buy a top of the line AMD card, you will probably get more than 2 years of support, but you lose even more money when it is obsoleted.
I will buy Nvidia next time, but only one of their new Maxwell chipsets or newer, but admittedly, that won't be a perfect solution either. Both solutions have major flaws. If you can afford to buy new GPU cards every 2 or 3years, go with AMD, otherwise, get Nvidia.
If you want to write modern OpenCL code and run it on a GPU, AMD is your only option.
In terms of performance, NVIDIA is actually the best. But they've been stuck at OpenCL 1.1 for years, while everyone else has long since moved to newer versions. Until (if) they add OpenCL 2.0 support, they'll be a bad choice.
Intel doesn't support running OpenCL on the GPU under Linux. See the chart at the end of https://software.intel.com/en-.... You can still write OpenCL programs, but you'll just be running them on your CPU.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
Go with a R285 (GCN 1.2) or R290 (GCN 1.1) for cost effective but getting all the features. If you want double floating point performance go with a 7970 (pre GCN) for now. The wikipage http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units is an excellent differentiator of card performances:
Use CodeXL too. Debugging is still pretty poor but even on nvidia this was painful last I recalled - it's always difficult to debug parallel programs - you just have printf or targeting the CPU (which is something nvidia code cannot do) and running under gdb. Supposedly visual studio has some slightly better integration or did at one point in time so you might check that out for debugging but profiling under CodeXL is fairly usable / comparable to what NSight offers.
Surprisingly enough AMD's drivers for OpenCL don't really have too many suck points. I actually find it far easier to setup an AMD GPU processor dev box or deployment than NVidias and easier to bring into a program as well.
Go check what you suggest first. Intel only supports OpenCL on their GPUs under windows, not linux.
hate to break this to you but Nvidia does the same thing only alot faster. Look up "Compute Capability". AMD's has been far more stable and NVidia's approach has been PITA * many more iterations.
Have a look at this talk, namely 8 min 30 seconds into the talk:
https://www.youtube.com/watch?...
The talk was given at the recent Linux Conf Australia (in New Zealand). It shows that AMD supports OpenCL 2.0, while Nvidia only support version 1.1 (released in 2010). I spoke to the speaker after his talk and he said Nvidia are basically dragging their heals with regard to supporting more recent versions. Nvidia also request unconvential features be put into the spec, and then never implement those features. Obvisouly Nvidia are doing well with their own CUDA language and seem to be trying to create a walled garden. It sounds like if you are going for openness and not for speed, then you could look at Intel or AMD (both support version 2.0).
The Gallium drivers for AMD have experimental OpenCL support. Southern Islands and Sea Islands GPUs are the best supported: Gallium Compute
I'm no serious parallel programmer, so I don't know how helpful this well be to you, but this might be of interest: 22-Way AMD/NVIDIA OpenCL Linux Benchmarks To Start Off 2015. Mike Larabel does some great work.
Lol.
Truth isn't Truth - Guliani
As for a particular model, if double-precision performance is important, go with a 7970 or 280x on theAMD side (or 7990 if you need dual-gpu in one slot). They did double-precision at 1/4th their single-precision rate, which is the best you're going to find at consumer-grade pricing -- even more-modern or more powerful cards have backed off on double-precision, so something like a 290x has almost 50% more shader ALUs than a 280x, and will perform better at single-precision workloads, but only does double-precision at a rate of 1/8th, so its actually slower in purely double-precision workloads. All of nVidia's consumer cards are in the ballpark of 1/8th to 1/16th rate too, except the GTX Titan Black, which did 1/3rd rate, but at $1500 is nearly Quadro pricing anyways.
If money is no object an AMD firepro 9100 is the workstation version of the 290x, and does double-precision at 1/2 single precision rate, and is the current best-of-both worlds, and will probably remain so for the remainder of the year, but its a 3-grand price tag or so.
Which company do you work for just out of curiosity?
I don't think this is true, as someone who's been working with CUDA since 2.x and OpenCL since 1.x and someone who's shipped safety critical, production software running in tens of millions of cars and aircraft running both CUDA and OpenCL... your assertions are frankly ridiculous and clearly a result of not having any technical, let alone practical understanding of the hardware nor software you're referring to.
OpenCL is frankly, a poor substitute for CUDA sadly - from a high level language perspective, we do support OpenCL however the development overhead is 4-8x more than that of CUDA (we often have to write device-specific kernels since OpenCL implementations are fragmented at best, if even compliant - often optimizing around very specific types like float3 or float4, or float) - sadly this is the cost of creating a high level abstraction around dozens of underlying hardware architectures.
CUDA is different in that it's tailored specifically towards nVidia's hardware architecture, we write maybe 3 CUDA kernels for various ranges of hardware that support dozens of devices each, it's more efficient, you get better access to specific hardware (specialised instructions, specialised memory access, advanced texture sampling, cache control, etc).
The general consensus I tend to get from my peers in the industry is we need more APIs like CUDA for specific hardware, as OpenCL is frankly a dud - many larger companies actually end up working with custom toolchains that compile their own language or C++ via clang/LLVM to SPIR/PTX/HSAIL - assuming they're not hand-writing SPIR/PTX/HSAIL themselves already due to the poor design of OpenCL.
Between a bit better language design and superior support and tools, CUDA is way easier to do your work in. We've 4 labs that use CUDA in one fashion or another, none that use OpenCL. A number have tried it (also tried lines like the Cell cards that IBM sold for awhile) but settled on CUDA as being the easiest in terms of development. Open standards are nice and all but they've got shit to do and never enough time to do it, so whatever works the easiest is a win for them.
On a different side of things, I've seen less issues out of nVidia on CUDA than AMD on OpenCL for video editing. Sony Vegas supports both for accelerating video effects and encoding. When I had an AMD card, it was crashes all the time with acceleration on. Sony had to disable acceleration on a number of effects with it. I had to turn it off to have a usable setup. With nVidia, I find problems are very infrequent.
Obviously this is one one data point and I don't know the details of development. However it is one of the few examples I know of a product that supports both APIs.
You say outright performance isn't really important. Have you considered just using the Intel HD Graphics built into your Haswell? There is an OpenCL implementation for Intel (beignet) and while I haven't particularly dabbled with software that needs OpenCL I've been very impressed with the stability and Just Works factor of the graphics side of the Intel graphics driver support. (I have an Ivy Bridge, so a bit older than your hardware; your silicon should be a fair bit faster.)
What are you betting their a?
That's really the question. Are you using the GPU for heavy-duty computing, or graphics, or...?
We've got money around here (we're a civilian-sector US gov't agency) using NVidia Tesla cards - in several servers, *two* of 'em - for heavy lifting with things like R. We do use the installable proprietary drives, and they work.
mark