Matlab Integrates GPU Support For UberMath Computation
An anonymous reader writes "Matlab now comes with GPU native support in the 2010b version. This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA. Pretty sweet for the HPC community."
But can't Octave do that already?
Matlab is great, but my god the language is cumbersome. More so than R, though less than SAS. Also, it costs money, and I'm cheap. So, I'm wondering if this could be worked into R somehow. Since R seems to execute code in a single tread sort of manner (I say, knowing just enough to be dangerous about these matters), each wee bit of speed is a godsent.
Note the "R2010b" version number. That means that this capability has been out since the second half of 2010.
There is competition from Jacket:
http://www.accelereyes.com/products/jacket
This product is more expensive but more effective than Matlab.
I tried the free trial and found it much more effective than Matlab.
Alas the cost is too high to justify Jacket in my case, I would rather
buy more hardware instead.
You fail, responding to an obvious AC troll.
who actually uses MATLAB for real HPC?
At first glance, I thought the subject said "Matlab integrates FPU support...". I was like, "Damn! I can break out the 486 DX again!".
There may be no "I" in team, but there's also no "F" in way.
Have not used the software other then a few times for an assignment but if I already did not know that it is not a particularly nice piece of software then i would be surprised that it took them this long to do the obvious.
Troll is not a replacement for I disagree.
For those interested in Python libraries there is PyCuda and gnumpy I have not used either - I'm still learning how to use parallel python
Yup, SciLab and Octave can both do that already.
I'm CEO of AccelerEyes and have been submitting Slashdot articles referencing updates about using GPUs with MATLAB for several years now. It's great to see it finally getting through, albeit via a reference to the "fake" GPU support which the MathWorks threw into PCT in an attempt to curtail the great success we continue to have with Jacket.
For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare
For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.
Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.
Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage
I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.
Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com
Cheers!
PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!
is there a particular octave package or function set? I saw some parallel algorithm and MPI libraries, but is there something I missed relating to GPU parallelization?
The article talks about R2010b, which isn't out yet. R2010a (which *is* out), supports parallel processing pretty well (I use it constantly), but not exactly "natively" - you have to pay extra for an option called the "Parallel Computing Toolbox" which also gives you sweet stuff like multicore, HPC and so on.
Now we have a vendor of an overpriced add-on battling it out with the vendor of the mother of all overpriced and badly designed pieces of scientific software. As someone who actually uses numerical scripting languages, let me tell you: I'm not impressed.
My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.
For those with a bent toward Mathematica, GPU computing is baked into Version 8.
There's more information at http://reference.wolfram.com/mathematica/guide/GPUComputing.html
In the spirit of full disclosure, I'm solely a long-time user, not a Wolfram employee.
For those interested in an open-source alternative, there's MAGMA, which provides a bunch of linear algebra routines implemented in CUDA. I haven't tried it myself yet, but it looks promising.
Why restrict the accelerated computation to Nvidia GPUs when one could use OpenCL to also exploit the power of AMD GPUs, multi-core CPUs and much more?
Interesting. i wonder if the GPU could be used to perform functions on large sets in a constant database.
Having to work for a living is the root of all evil.
In your benchmarks, you list "1.26 hours" for Canny edge detection on a 4 Mpixel image in Matlab without GPU computing, and you miraculously speed that up to 8 seconds using your GPU tools:
http://www.accelereyes.com/products/benchmarks
On my three year old desktop, using just 1 CPU from a Core 2 Duo, I can do Canny edge detection on a 4 Mpixel PGM image in about 1.7 seconds with straightforward C code (no pointer tricks), including I/O, parsing the PGM, and god knows what else. It's about the same in Python.
So, my conclusions from this are that your company appears to have trouble writing efficient Matlab code (not a good sign for a company making a Matlab extension library), and that instead of spending a lot of time and money on Tesla boards and your GPU extensions to Matlab, people are better off writing their code in Python and use a little bit of C if they have to.
This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA
But you will have to re-code soon when a new version of Matlab is released and functions have changed over and over again! Yes, talking from personal experience...
There are tons of other CUDA accelerated numerical packages besides Matlab -- Mathematica, LabView, plugins / wrappers / libraries for Python, R, IDL. Some of these are linked from NVIDIA's website
http://www.nvidia.com/object/numerical-packages.html
Others from
http://www.nvidia.com/object/data_mining_analytics_database.html
I've been using Jacket for a months now to do image and signal processing. It's way ahead of both MATLAB's PCT copy and gpumat.
A lot of misinformation among the comments, so rolled up my sleeves to get some measurements. Wanted to re-evaluate which to use in my work since R2011a just introduced GPU convolution. A quick benchmark of MATLAB R2011a with/without IPP compared to Jacket 1.7.1 for the various convolution functions getting mentioned.
- IMFILTER uses Intel Performance Primitives
- CONV2 non-separable
- CONV2 separable kernels
- FILTER2 tests separability and calls CONV2
Numbers and script below. I used Jacket's TIMEIT function but made a copy and removed GEVAL/GSYNC so you can time MATLAB's stuff too. Have to run Jacket and MATLAB separately, toggle if-statement. Hardware: cpu i7 920, gpu C2070. This is just testing one image (1024x1024) and kernel size (9x9) ... obviously there's a thousand different convolution cases I could benchmark but this here's enough to get a feel.
Some tentative thoughts:
- For MATLAB CPU, enabling IPP had a 10x difference for IMFILTER, but CONV2 was unaffected. I found their CONV2 performance always better than their IMFILTER with IPP enabled. So CONV2 appears to embody their best, most optimized convolution code.
- Neither MATLAB nor Jacket are detecting separability in CONV2 -- wanted to sanity check the benchmark as you suggested.
- For Jacket, separable stuff is 100x faster than IPP as we would hope. But it's even more interesting that it's 6x faster than MATLAB's GPU implementation -- what is MATLAB doing wrong?!?
- For Jacket, IMFILTER was slightly slower than CONV2 -- something they should fix -- a bug? Also IMFILTER interface was a little more cumbersome than CONV2 -- please fix :)
Bottom line: Jacket's GPU slaughters both MATLAB's CPU and GPU implementations. I didn't find any case where MATLAB's stuff came close.
----/begin timings/----
[matlab cpu]
matlab imfilter: 0.015595 sec
matlab conv2: 0.014741 sec
matlab conv2-sep: 0.056838 sec
matlab filter2: 0.057271 sec
[jacket gpu]
jacket imfilter: 0.004392 sec
jacket conv2: 0.004197 sec
jacket conv2-sep: 0.000584 sec
jacket filter2: 0.000647 sec
[matlab gpu]
pct imfilter:
pct conv2: 0.006034 sec
pct conv2-sep: 0.003306 sec
pct filter2: 0.003912 sec
----/begin script/----
The code below requires you change every "Q" to "at symbol" and every "W" to "percent symbol" (/. junk characters)
n = 1024;
k = 9;
a = rand(n,n,'single');
kern_sep = ones(k,1) / k;
kern = kern_sep * kern_sep';
W functions to benchmark
imfilter_fn = Q(a) Q() imfilter(a, kern, 'same');
filter2_fn = Q(a) Q() filter2(kern, a, 'same');
conv2_fn = Q(a) Q() conv2(a, kern, 'same');
conv2_sep_fn = Q(a) Q() conv2(kern_sep, kern_sep, a, 'same');
fprintf('[disable IPP]\n'); iptsetpref('UseIPPL', false)
fprintf('matlab imfilter: W.6f sec\n', timeit_matlab(imfilter_fn(a)))
fprintf('matlab conv2: W.6f sec\n', timeit_matlab(conv2_fn(a)))
fprintf('matlab conv2-sep: W.6f sec\n', timeit_matlab(conv2_sep_fn(a)))
fprintf('matlab filter2: W.6f sec\n', timeit_matlab(filter2_fn(a)))
fprintf('\n[enable IPP]\n'); iptsetpref('UseIPPL', true)
fprintf('matlab imfilter: W.6f sec\n', timeit_matlab(imfilter_fn(a)))
fprintf('matlab conv2: W.6f sec\n', timeit_matlab(conv2_fn(a)))
fprintf('matlab conv2-sep: W.6f sec\n', timeit_matlab(conv2_sep_fn(a)))
fprintf('matlab filter2: W.6f sec\n', timeit_matlab(filter2_fn(a)))
if false W can't run same time .. toggle in separate matlab instances
fprintf('\n[jacket gpu]\n')
a = gsingle(a);
kern = gsingle(kern);
imfilter_fn = Q(a) Q() imfilter(a, kern, 'same');
fprintf('jacket imfilter: W.6f sec\n', ti
It's interesting that you find that Jacket's CONV2 doesn't detect separability, because the guy who runs the company claimed that it did; he was talking about all the "advanced algorithms" that CONV2 contains.
Of course, GPU computing speeds up convolutions; that's what GPUs were designed to do. The questions we have been discussing are the following.
First, is GPU cost-effective for most applications right now and are people going to see the speedups they hope for? In my experience, the answer is "no", because most applications need to do a lot more than convolutions and speedup on other operations is much less. GPU computing is tricky enough that there is a lot more to worry about than whether one inner loop runs faster on the GPU.
Second, should you be using MATLAB for this? The answer there is "no" as well: MATLAB does some things well, but overall it's a badly designed language and environment. There are much better choices available, and many of them open source and with good support for GPU computing.
Third, we were talking about who was taking the lead and who was following, with the Jacket CEO trying to portray his company as leading the way and unpaid open source developers eventually copying their stuff. That is totally wrong. This style of computing was developed in academia long ago. There have been tons of implementations, both commercial and open source, over the years. Jacket is filling one particular market niche for one particular platform.
Having said all that, it sounds to me that Jacket is probably the best GPU computing solution for MATLAB and probably better than anything MATLAB provides. I just don't think that means all that much: most people likely won't see much of a real-world speedup from it, and most people would be far better off kicking the MATLAB habit altogether and moving on to better platforms, open source or otherwise.
ANSYS is also going to able to use GPUs for parallel processing. The crappy part is they charge you 1500 bucks for each HPC license for each processor.