Matlab Integrates GPU Support For UberMath Computation

Nice by Anonymous Coward · 2011-05-23 04:25 · Score: 2, Interesting

But can't Octave do that already?

Re:Nice by jank1887 · 2011-05-23 04:52 · Score: 1

a quick google search turned up one or two discussion threads where the focus of the debate was whether certain GPU processing libraries were license compatible with Octave. After a few pages of discussions about GPL, BSD, linking libraries and system libraries, Brook and CUDA and something else, my eyes glazed over.
so, maybe?

How long until R supports this? by MaXintosh · 2011-05-23 04:29 · Score: 1

Matlab is great, but my god the language is cumbersome. More so than R, though less than SAS. Also, it costs money, and I'm cheap. So, I'm wondering if this could be worked into R somehow. Since R seems to execute code in a single tread sort of manner (I say, knowing just enough to be dangerous about these matters), each wee bit of speed is a godsent.

Re:How long until R supports this? by Anonymous Coward · 2011-05-23 04:33 · Score: 2, Interesting

http://cran.r-project.org/web/views/HighPerformanceComputing.html
See "Parallel computing: GPUs"
Re:How long until R supports this? by DarkSkiez · 2011-05-23 04:33 · Score: 2

Support exists now:
http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/
Re:How long until R supports this? by Garble+Snarky · 2011-05-23 04:43 · Score: 1

Can you elaborate on how Matlab is cumbersome (or give a link)? I started using Matlab after using C for a few years and "cumbersome" is the last word I would use to describe Matlab. I haven't really used any of the other scientific computing packages, though.
Re:How long until R supports this? by Rufty · 2011-05-23 04:58 · Score: 1

As to the cost, try octave - mostly matlab-ish, but GPL.

--
Red to red, black to black. Switch it on, but stand well back.
Re:How long until R supports this? by Missing.Matter · 2011-05-23 05:07 · Score: 1

I think it depends on what you want to do. Matlab is great for reading and working with log files. It's also great if your tasks can be vectorized; your code will be fast and require very few statements.
However, if your project requires iteration, it's going to be slow as hell in Matlab.
The biggest complaints I have about Matlab (besides the cost) are the way it handles memory management, and the way it handles pointers. I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory available on my 8GB machine, because I ran out of what it had allocated for me. As far as pointers go.... well there is no such thing. Things can get very complicated because of this. It seems that many Matlab programmers consider using global variables a common practice because of this, which makes maintaining code an absolute hell.
But over all, we use Matlab because of how fast it is to code something up compared to C, especially for our work (robotics) which is very Linear Algebra oriented. These days, computers are fast enough that proper Matlab cide is a viable alternative to C in terms of performance, and for the things that really need to be fast there's always MEX.
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 05:15 · Score: 0

You have to create a new .m file for every function you want to create. How is that not cumbersome?
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 05:16 · Score: 0

Buy a new version of Matlab. There are pointers. It's called the handle class.
Re:How long until R supports this? by pz · 2011-05-23 05:23 · Score: 1

I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory available on my 8GB machine, because I ran out of what it had allocated for me.
Verne, I think you're doing something wrong there. The only time I see that sort of error is when I've done something worthy of a palm-in-face like trying to pre-allocate a 7-D array with 1000 elements per dimension. Yes, you have to be careful how many times copies of large arrays are made, but that's true of any language.
Also, with the newer versions of Matlab, iterations aren't that slow, at least compared with the older versions from a decade ago. You do, however, need to be very careful about accurately pre-allocating arrays to avoid the built-in automatic reallocation that can silently turn your nice O(n) algorithm into O(n^2) or worse.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 05:48 · Score: 0

Have to agree on that. Matlab is nice for prototyping, not for serious computational work. Memory management stinks. I have seen systems with 128 Gb RAM and 256 GB swap come to a crawl because of bad matlab programming (doubling used RAM every second or so). I guess it getting worse with every new version, seen examples where matlab 5.3 performance is about 15 to 20 times that of 2010b, simple fortran routines outperforming matlab by a factor of 200.
Guess whe should teach students to do their own programming again in fortran or C/C++.
Re:How long until R supports this? by stewbee · 2011-05-23 06:07 · Score: 3, Informative

I can't speak for 10 years ago, but Matlab is still slow when using 'for' loops. Just recently I was updating a grid search algorithm that was originally done in VBA. I originally ported it to Matlab using all of the vectorization tricks that were available, but I still needed for loops. For smaller data sets, it was tolerable but when our input data grew to a certain size, it would take over 70 minutes for the computation to complete (as a side note, this is with Matlab 2010b).

To speed up the computation, I at first just wrote a Java class to be called from Matlab. this showed considerable speed improvement when compared to the Matlab code. I then decided that I could multithread the application in Java for even more through put. In this particular machine, I have 12 cores, so I used 10 threads and reduced the computation from over 70 minutes to less than a minute by using a Java class plus Java's concurrent libraries.

Now, in general I prefer to code in Matlab, because you can do more with less lines of code, but there are certain times where strictly Matlab is not fast enough. What is nice with Matlab 2010b, ( I don't remember how far this capability goes back), you can seamlessly use Java .jar files and create Java objects in your Matlab code. As an added bonus, Matlab creates 'double' arrays by default for numeric values. This can be passed in directly as an argument to your method without casting types like you might need to when using a .dll file.
Re:How long until R supports this? by pz · 2011-05-23 06:41 · Score: 1

I've had similar luck staying within Matlab by using their profiler. Although I've used C callouts for some high-performance computations (like implementing a fast 2D histogram), I try and stay within Matlab whenever possible as mostly, not always, but mostly, the time spent optimizing a computation would far, far outweigh the time gained from a faster algorithm. If we know from the get-go that a given algorithm will be run many times, or is performance critical, it might be coded up in Matlab to prove correctness, but will ultimately be implemented in C.
Which reminds me about Matlab libraries. They're one of the very best things about writing in Matlab, especially the relative scarcity of bugs and generally high level of algorithmic correctness. But Matlab library routines are often written to be highly forgiving, and if you can specialize them for given data types (eg, by knowing your input is always a row vector), remove the bounds checking, etc., you can typically get a factor of 2 speedup. The Matlab profiler is critical for doing that.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Re:How long until R supports this? by fph+il+quozientatore · 2011-05-23 08:03 · Score: 1

Shame on Matlab for this, though. Self-growing arrays with better amortized overhead have been around for ages (see any decent C++ vector implementation, for instance).

--
My first program:
Hell Segmentation fault
Re:How long until R supports this? by Garble+Snarky · 2011-05-23 09:37 · Score: 1

That's never bothered me, but personally I suppose it's partly offset by how much more terse Matlab code can be than, say, C. Can you do this in R, or Octave? Like I said I have no experience with those.
To me, this seems like more of a personal workflow/environment preference, rather than a problem with the language itself. Of course there are a few changes I would like to be made in the Matlab environment (and having this option would be a good thing), and the speed and memory issues certainly prevent using Matlab for any kind of serious production system. But I don't see how the language itself can be called cumbersome in comparison to lower level languages.
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 10:24 · Score: 0

You mean, like Java?
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 13:35 · Score: 0

Agreed. In my 20 years of programming as an engineer in Fortran, C, Basic, Mathcad, Octave, Python, and Matlab, I have found no faster way for prototyping than in Matlab. I was dragged kicking and screaming into learning Matlab 10 years ago and quickly realized the benefits. It is worth the price you pay. If you have a bottleneck once the code works, write a MEX file in C or fortran.
Re:How long until R supports this? by Anonymous Coward · 2011-05-23 19:45 · Score: 0

I've barely ever touched Java (I'm an embedded systems/signals guy), but I'm pretty sure you can throw all the methods you want in a single file for a class. I think it is only separate classes that require separate files.
In MATLAB, all your helper functions either need to be manually-inlined (so they're no longer functions) or declared in separate .m files. In Java terms, all your private methods aren't private and they each need to be in their own file.

Old news by Anonymous Coward · 2011-05-23 04:37 · Score: 2, Insightful

Note the "R2010b" version number. That means that this capability has been out since the second half of 2010.

Re:Old news by guruevi · 2011-05-23 05:04 · Score: 3, Interesting

I think they meant 2011b which is not out yet (it's in beta). GPUmat as well as nVIDIA had toolboxes for MATLAB for a while now (although the CUDA toolboxes require manual code edits and compiling to get it to work) but there is only limited function support (eg. FFT on large arrays works wonders on CUDA) and even those had limited support (only single floating point precision for example). There is also the commercial Accelereyes with Jacket.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:Old news by Anonymous Coward · 2011-05-23 05:05 · Score: 0

Yup. Agreed. Plus, they should mention this is available in Matlab only with the parallel computing toolbox, not in the base matlab package.
That being said, I think Mathworks has created some mighty slick ways to integrate parallel computing into otherwise typically single-threaded language and workflow. I especially like parfor, the parallel for loop.
Re:Old news by Anonymous Coward · 2011-05-23 05:18 · Score: 0

Nah, "b" in matlab version numbers doesn't mean beta. They have, for the last several years, released two versions a year, an 'a' and 'b'. (2008a and 2008b, etc, etc) Just their way of delineating between the two yearly versions. 2011a is already out which is what the previous commenter was alluding to.
Re:Old news by tomz16 · 2011-05-23 05:40 · Score: 1

2010b did not include GPU array indexing support (among other things), making it fairly worthless for anything moderately complex.
2011a DOES do indexing on GPU arrays. It works very well in my experience so far.
Re:Old news by kthreadd · 2011-05-23 05:56 · Score: 1

There's still nothing that stops R2011b from being in beta, right now.

Jacket by Framboise · 2011-05-23 05:07 · Score: 2

There is competition from Jacket:
http://www.accelereyes.com/products/jacket
This product is more expensive but more effective than Matlab.
I tried the free trial and found it much more effective than Matlab.
Alas the cost is too high to justify Jacket in my case, I would rather
buy more hardware instead.

Re:Jacket by Anonymous Coward · 2011-05-23 05:17 · Score: 1

Jacket costs as much as a toolbox from Matlab (which is required to use their GPU stuff). I'm a Jacket user and am more than pleased with both performance and support. Jacket also supports more functions and is faster.
Re:Jacket by Anonymous Coward · 2011-05-23 05:26 · Score: 0

I would agree - also note that Jacket is more mature and it has been around much longer than Matlab's GPU efforts.
Re:Jacket by Anonymous Coward · 2011-05-23 06:00 · Score: 0

I'm using the student version of Jacket to prototype some IR work. It definitely has much less of learning curve than doing raw CUDA/OpenCL. I needed GPU sort that was supported by Jacket but no the GPU support in Matlab. Also it's just easier to type code using Jacket, but that's a personal taste thing. I'll be getting a grant to buy the commercial version of Jacket especially because I want to utilize the mutlti-gpu support. That thing looks awesome.

Re:Strict Muslim kills stepdaughter by Anonymous Coward · 2011-05-23 05:07 · Score: 0

You fail, responding to an obvious AC troll.

Pretty sweet for the HPC community.... by Anonymous Coward · 2011-05-23 05:13 · Score: 2, Insightful

who actually uses MATLAB for real HPC?

Re:Pretty sweet for the HPC community.... by Anonymous Coward · 2011-05-23 05:38 · Score: 0

My question exactly. Matlab is a resource HOG. Sure it's easy to implement stuff, but it is so far from efficient doing it using it in a HPC environment is retarded. There are better libraries suited to that.
Even Java is faster.
Re:Pretty sweet for the HPC community.... by Anonymous Coward · 2011-05-23 05:51 · Score: 1

who actually uses MATLAB for real HPC?
Lots of people. I know for a fact that NASA's Columbia supercomuter has Matlab licenses. Moreover, for certain engineering applications, Matlab is the de-facto standard (like control theory and certain areas in signal processing). Sure, you could write every solver, optimizer, and toolbox in standard C, C++, OpenMPI (a lot of control is just numerical optimization), but it would mean a lot of coding from ground up. Alternatively, you can get a bunch of specialized libraries, but then the administrator should be willing to let you use them on the cluster.
Re:Pretty sweet for the HPC community.... by guruevi · 2011-05-23 07:06 · Score: 1

People that can't code for shit. It's pretty popular in the scientific community simply because it combines the simplicity (and noobicity of the coders) of PHP, Python or Ruby with high-level mathematical constructs.
The thing is that for real coders it's actually harder because you're missing stuff like decent, inline evaluation of variables, loops and if/then/else constructs, evaluation of data types is hard to do, function overloading, regexp, greater-than-or-equal, and even the very basic of text evaluations seem to be missing. Even a simple v =+ 5 fails to evaluate. Most of your data has to be evaluated before entering a function and it just makes everything look messy and bloated.
Another thing is the messy licensing problems associated with MATLAB. MATLAB on a license server for example stops running the program in order to check-in resulting in a periodic ~150-250ms delay which is murder when you have to evaluate millisecond-based responses and beware if the network or the licensing server is down, all MATLAB systems simply stop functioning within 10 minutes. Toolboxes are individually checked out during runtime so if you require a function from eg. the statistics toolbox, it won't check out the license until it actually gets to the line of code that runs the function and THEN it fails instead of checking out the license before the program starts running. So you have to add more code (a couple of lines at the top to initialize a toolbox function) which adds to the bloatiness.
The rest of the licenses are activation based while deactivation doesn't work usually so you have to manually deactivate the license to re-install a computer.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:Pretty sweet for the HPC community.... by SomeKDEUser · 2011-05-23 08:51 · Score: 1

Seconded. I still remember some guy that ported his MATLAB finite element code to c++. The solving time for his problems went from 24 hours to 8 minutes -- 6 of which were the post-processing/display of the results... That was towards the end of his thesis, so basically he must have wasted countless months of productive work because of that.
MATLAB is never the right tool -- unless you are really incompetent, so you need the hand-holding, and really obtuse, so you can't handle the small differences with Octave. And the "toolbox" excuse is a load of crock: there most certainly is a c or c++ implementation of what you need.
Re:Pretty sweet for the HPC community.... by loufoque · 2011-05-23 10:31 · Score: 1

The common process is for one applied mathematician to write the algorithm in Matlab, then 15 people to convert it to optimized C++/CUDA.
Surely you can see the point in making Matlab faster, or for automated generation tools.
Re:Pretty sweet for the HPC community.... by Anonymous Coward · 2011-05-23 11:49 · Score: 0

MATLAB is never the right tool -- unless you are really incompetent, so you need the hand-holding, and really obtuse, so you can't handle the small differences with Octave. And the "toolbox" excuse is a load of crock: there most certainly is a c or c++ implementation of what you need.
That is like saying - Word is never the right tool, because OpenOffice works just fine. Also, there is a LaTeX implementation of what you need.
Only, it is even more extreme. I get "free" access to Matlab (my funding pays my tuition as well), and don't want to spend my time getting/writing libraries for what comes natively with Matlab. Can it be done? Sure. Do I want to? No way in hell.
Re:Pretty sweet for the HPC community.... by MurukeshM · 2011-05-23 12:30 · Score: 1

It all boils down to one thing. Which time do you value more? The time spent in coding/debugging (more headaches, but short)? Or that spent in running the program (lots of patience required, but nothing else). Answer that, and you'll know what you want to use.

Support for FPU by Naatach · 2011-05-23 05:22 · Score: 1

At first glance, I thought the subject said "Matlab integrates FPU support...". I was like, "Damn! I can break out the 486 DX again!".

--
There may be no "I" in team, but there's also no "F" in way.

Its about time .. by wisnoskij · 2011-05-23 05:23 · Score: 1

Have not used the software other then a few times for an assignment but if I already did not know that it is not a particularly nice piece of software then i would be surprised that it took them this long to do the obvious.

--
Troll is not a replacement for I disagree.

Re:Its about time .. by mcclanahoochie · 2011-05-23 05:35 · Score: 1

As mentioned above, Jacket has been doing this since 2008, and at the moment seems superior to Matlab's GPU implementation... http://www.accelereyes.com/press

Python libraries by witch-doktor · 2011-05-23 05:35 · Score: 1

For those interested in Python libraries there is PyCuda and gnumpy I have not used either - I'm still learning how to use parallel python

Re:Python libraries by Anonymous Coward · 2011-05-23 06:08 · Score: 0

And don't forget CorePy (www.corepy.org) if you want to get to the bare metal from Python!
-Chris
Re:Python libraries by witch-doktor · 2011-05-23 11:06 · Score: 1

That's an interesting lib. I should play with it.

Re:Nice - Octave and SciLab too by Anonymous Coward · 2011-05-23 05:51 · Score: 0

Yup, SciLab and Octave can both do that already.

3 Years in the Making by melonakos · 2011-05-23 05:58 · Score: 4, Informative

I'm CEO of AccelerEyes and have been submitting Slashdot articles referencing updates about using GPUs with MATLAB for several years now. It's great to see it finally getting through, albeit via a reference to the "fake" GPU support which the MathWorks threw into PCT in an attempt to curtail the great success we continue to have with Jacket.

For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare

For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.

Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.

Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage

I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.

Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com

Cheers!

PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!

Re:3 Years in the Making by Anonymous Coward · 2011-05-23 06:48 · Score: 0

I'm CEO of AccelerEyes and have been submitting Slashdot articles referencing updates about using GPUs with MATLAB for several years now. It's great to see it finally getting through, albeit via a reference to the "fake" GPU support which the MathWorks threw into PCT in an attempt to curtail the great success we continue to have with Jacket.
For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare
For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.
Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.
Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage
I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.
Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com
Cheers!
PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!
Hi guys, nature of monkey just sent you mail

Re:Nice - Octave and SciLab too by jank1887 · 2011-05-23 06:01 · Score: 1

is there a particular octave package or function set? I saw some parallel algorithm and MPI libraries, but is there something I missed relating to GPU parallelization?

Article seems slightly inaccurate by HonestButCurious · 2011-05-23 06:45 · Score: 1

The article talks about R2010b, which isn't out yet. R2010a (which *is* out), supports parallel processing pretty well (I use it constantly), but not exactly "natively" - you have to pay extra for an option called the "Parallel Computing Toolbox" which also gives you sweet stuff like multicore, HPC and so on.

Re:Article seems slightly inaccurate by Anonymous Coward · 2011-05-23 06:52 · Score: 0

I think you're confusing your 2010's and 2011's?
Re:Article seems slightly inaccurate by Anonymous Coward · 2011-05-23 09:49 · Score: 0

The article talks about R2010b, which isn't out yet.

Not sure where you're getting your information from. R2010b was released last year, and R2011a dropped over a month ago.
http://www.mathworks.com/products/new_products/latest_features.html
You are correct though that a PCT license is required for GPU support.
Re:Article seems slightly inaccurate by Anonymous Coward · 2011-05-24 06:03 · Score: 0

R2011a is the current release, it came out recently and with significant enhancements to the GPU functionality:
The following functions are enhanced to support GPUArray data:
cat, colon, conv, conv2, cumsum, cumprod, eps, filter, filter2, horzcat, meshgrid, ndgrid, plot, subsasgn, subsindex, subsref, vertcat and all the plotting related functions.
You can index into a GPUArray for assigning and reading individual elements.
GPU support is extended to include the following MATLAB code in functions called by arrayfun to run on the GPU:
&, |, ~, &&, ||, while, if, else, elseif, for, return, break, continue, eps
http://www.mathworks.com/help/toolbox/distcomp/rn/bsuaink-1.html#bsuaiol

great! by t2t10 · 2011-05-23 06:48 · Score: 4, Insightful

Now we have a vendor of an overpriced add-on battling it out with the vendor of the mother of all overpriced and badly designed pieces of scientific software. As someone who actually uses numerical scripting languages, let me tell you: I'm not impressed.

My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.

Re:great! by MischaNix · 2011-05-23 07:10 · Score: 1

within a year or two
That's just it. We've been in the world of parallelization for years now, but relatively few open source developers have innovated or even ported for performance. Why? Because such performance gain is a luxury. You pay more for luxuries, that's just a fact of life.

Nevertheless, your timeframe sounds decent--but that's only as there become more varied and more open tools to support parallel implementations.
Re:great! by coolsnowmen · 2011-05-23 07:50 · Score: 1

Isn't that always the case? A slight demand for something ( easy gpu programming at the matlab/octave level ), a company starts up to offer that service. Over time, if there is enough general demand, people start putting code snipits into the FS community, and it becomes a project, and over a 2 year period it becomes usable. But in the intervening period a company is trying to make money on this recent need that will become almost common place in a few years.
As far as matlab vs octave- There are still some basic things that are a PITA on octave. Building Graphs with different fonts and getting everything in it's right space is almost impossible (when I tried it 6 months ago). I looked at examples with latex syntax, that embed gnuplot requests and they still didn't work. I ended up just making a basic graph, and annotating it in gimp. Other than that, I love templating control and image algorithms in octave.
Re:great! by t2t10 · 2011-05-23 07:57 · Score: 1

That's just it. We've been in the world of parallelization for years now, but relatively few open source developers have innovated or even ported for performance.
A lot of parallel processing that's coming out today commercially was pioneered by open source projects years ago. OMP and distributed computing are widely used.
On GPU computing, the speedups are barely worth it today unless you really hand-optimize your application for parallelization; you're not going to get a lot of speedups with Jacket on real code. The reason there aren't more open source tools using GPU computing is because the effort still isn't worth it.
These kinds of technologies are being pushed by vendors with a commercial motive. They are selling snake oil. And if you want to see how snaky they are, take a look at their benchmarks:
http://www.accelereyes.com/products/benchmarks
I mean, 8 hours for a Canny edge detection on an 8 Mpixel image? Are they doing pixel-by-pixel processing in Matlab?
Re:great! by t2t10 · 2011-05-23 08:04 · Score: 1

There is no "real need" for GPU computing yet because for most people, it's not cost effective: the speedups are modest at best, and you only get them if you know what you're doing (in which case you wouldn't be using these tools). Get yourself a multicore machine and use OMP and your code is likely going to run faster with less effort.
Open source developers will tackle GPU computing in scripting languages when it makes sense to do so. That's not because they need commercial leadership or leaks of "code snippets", it's because open source developers don't have a commercial ax to grind.
As for open source numerical solutions, I wasn't talking about Octave; Octave is as much of a joke as Matlab. There are tons of better tools, including R and Python.
Re:great! by melonakos · 2011-05-23 08:38 · Score: 1

To address the topic of open source:

People have been saying that open source would swamp Jacket since we launched in 2007. The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid), when there isn't an enormous user community to fuel the effort in intangible benefits back to the contributors. Otherwise, we'd open source Jacket and try to live off the service contracts like every other open source project.

So we end up pricing the software inline with what people are used to paying for addons to MATLAB. And Jacket is great so we end up doing really well with this model.

While GPU computing in MATLAB is too small a niche, M-programming in general is ripe for the open sourcing. Octave has never gained any steam and has been around so long that it is stale. Scilab seems good but is stuck in Europe. We would be thrilled to participate with the community in building something that delivers more promise overall. What is certain is that MathWorks has a greater stranglehold on science/engineering than Microsoft does on operating systems.
Re:great! by melonakos · 2011-05-23 08:44 · Score: 1

Hand optimization is really tough. Try for instance to beat Jacket's CONV2 (yes, I'm talking about straightup convolutions) by hand. If you can do that, I'll will expend all my energies to drop whatever else you are doing and to join us at AccelerEyes :)

Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.

The Canny Edge benchmark is a full blown application (of which Canny Edge detection is the major component). The image sizes that are processed are listed in the graph, but there are tons of images being processed in the course of running the full application. We should make that clearer on our website... thanks for pointing that out.
Re:great! by MattskEE · 2011-05-23 09:12 · Score: 1

My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.
I don't dispute that there are alternatives to Matlab, but "better" is still premature in my opinion. Over the past year I had an interest in removing my work's dependence on proprietary software, so I have researched the Matlab alternatives, and I have even been using Python for some of my work (instrument automation and simple data processing and plotting), but I would say that open source has a way to go before reaching Matlab's level. For a professional user the license costs are not terribly significant, for academic, personal, or consultants, the open alternatives might make more sense.
One of the usual problems with open source which is true here is fragmentation. Matlab has a wide range of nice add-ons available for purchase which integrate with Matlab and Simulink. Some of the features of these may be replicated in other products, but rarely are they replicated as well, and they certainly are not integrated into a single architecture. No, you have certain functions available in Python libraries, one of the scientific Python distributions, or Octave, or Scilab, and you must spend much more time finding each one, finding the documentation, and integrating it into your workflow.
The second usual problem of open source software is polish. Matlab is more polished and in my opinion has a much better UI than the open alternatives I've used. Yes it has some problems, but on the whole I think their interface is better, and the documentation is better. Open source programs like many Python libraries may be comprehensively documented, but the documentation is rarely as well organized. Of course the fragmentation comes up here again: some libraries may have excellent documentation, others very poor.
This does raise one of the benefits of open source: it's open. There was a Python library (mwavepy) which I wasn't understanding too well from the documentation alone, so I was able to dive into the source code. I even thought about contributing to the project, though I haven't gotten around to it.
So in summary, I like Matlab, and I like open source. But I think Matlab is simpler and easier to get the job done, and that is what professional users care about more than a couple $k in license costs.
Re:great! by Anonymous Coward · 2011-05-23 10:30 · Score: 0

I couldn't disagree more with your first paragraph.
Talking about speedups is always a bit slippery because of the sensitivity of the CPU baseline, but very significant speedups (at least an order of magnitude) are definitely possible.
Re:great! by t2t10 · 2011-05-23 12:44 · Score: 1

I would say that open source has a way to go before reaching Matlab's level
The first thing you should do is stop thinking of it as "levels". Matlab has a few packages that you can't easily get for Python, and it has Simulink. There are many other areas where Matlab isn't even remotely close to Python's level. The two are, as the technical term goes, "incomparable".

One of the usual problems with open source which is true here is fragmentation
Well, closed source is even more fragmented! There's Matlab, Mathematica, S, SPSS, PV-Wave, J, and tons of other closed source software, all incompatible! It's a wonder anybody gets anything done with closed source software at all with all that fragmentation!

The second usual problem of open source software is polish. Matlab is more polished and in my opinion has a much better UI than the open alternatives I've used.
You mean its buttons-and-windows disaster that's stuck some time in the 1990's? Come on, what about a decent modern IDE?

Open source programs like many Python libraries may be comprehensively documented, but the documentation is rarely as well organized. Of course the fragmentation comes up here again: some libraries may have excellent documentation, others very poor.
There are tons of shitty, poorly documented Matlab libraries around as well.

So in summary, I like Matlab, and I like open source. But I think Matlab is simpler and easier to get the job done, and that is what professional users care about more than a couple $k in license costs.
Being a "professional" just means you do it for money, it doesn't mean you're competent or good at it. Matlab and its vendor support is a safety blanket for engineers who can't program and developers who don't know anything about computational methods. They don't actually need it, but they are too intellectually lazy to change or learn a different way of doing things; Matlab is their crack.

For a professional user the license costs are not terribly significant, for academic, personal, or consultants, the open alternatives might make more sense.
Oh, academics and consultants are not "professionals"? Wow, so what exalted group of professionals do you fancy yourself to be a part of? Can't be corporate users, because if you were one of those, you'd know that while a few thousand dollars for software licenses for each user isn't going to affect the bottom line, the paperwork would kill you.
Re:great! by t2t10 · 2011-05-23 13:23 · Score: 1

The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid)
Most open source developers I know are paid and the stuff they produce has wiped away pretty much anything commercial and proprietary in any area where they have developed it.
The reality is that GPU computing barely makes sense today, and it certainly didn't make sense in 2007. And it may just be another fad, taken over again by general purpose CPUs, just like the last few times.

While GPU computing in MATLAB is too small a niche, M-programming in general is ripe for the open sourcing.
Your idea that things start out as commercial code and then move into open source is backwards. Many Matlab toolboxes started out as open source, and then were gold-plated for the typical Matlab customer.
Re:great! by t2t10 · 2011-05-23 13:34 · Score: 1

Talking about speedups is always a bit slippery because of the sensitivity of the CPU baseline, but very significant speedups (at least an order of magnitude) are definitely possible.
Yes, you can speed up individual algorithms sometimes by an order of magnitude. But that often doesn't help you much with overall program performance, because once you eliminate one bottleneck, another one takes its place. Buying more cores gives you less speedup for each algorithm, but makes it much more likely that you speed up your entire program, and with less programming effort too.
Re:great! by Anonymous Coward · 2011-05-23 13:57 · Score: 0

I think all he/she meant was that professionals tend to have a larger budget for commercial packages like Matlab. I am an academic and did not take offense at that comment, and was somewhat amused by your 'lazy' and 'crack' comments. For my research I use codes written in C, Fortran and Matlab. Most of my own development is done in Matlab and then I port if I have a need for it. I choose to do this because in years of programming I have found that I get faster results that way. Now if you will excuse me, I'm going to be lazy for another 3 hours our so before bed at midnight and smoke my Matlab pipe.
Re:great! by t2t10 · 2011-05-23 14:30 · Score: 1

Hand optimization is really tough. Try for instance to beat Jacket's CONV2 (yes, I'm talking about straightup convolutions) by hand.
CONV2 is one of the most trivial cases for GPU programming; if you didn't screw up badly, I can't beat your code, but you couldn't beat mine either.

Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.
Most people shouldn't be using GPU programming at all because it isn't going to speed up their programs significantly overall.
Most success stories with GPU programming involve incompetence. For example, someone using CONV2 in their code is going to see a big speedup moving to a GPU, but they shouldn't be using CONV2 to begin with!
Re:great! by melonakos · 2011-05-23 14:54 · Score: 1

I don't agree with either statements:

1) Expert convolutions on the GPU (that work well for both separable/non-separable cases, arbitrary input matrix sizes, and arbitrary kernel sizes) are extremely difficult. I don't think you can be our implementation. If you can, I will try to entice you away from other pursuits in life.

2) CONV2 (i.e. convolutions) are very useful in many applications and often make more sense that pursuing some sort of other arithmetic expression. I do agree with your statement though that algorithm/implementation choice is critical and is a decision that should come before optimization efforts. I just think convolutions are an essential tool to which many problems are best boiled down.
Re:great! by t2t10 · 2011-05-23 21:25 · Score: 1

CONV2 (i.e. convolutions) are very useful in many applications
Your error is with the "i.e." part. Convolutions are very useful, but CONV2 is almost never the right function to call. Most convolutions are separable. Those that aren't can usually be made separable. If you're really stuck with a non-separable large 2D convolution, you can use 2D FFT in some cases. And if you have a non-separable small 2D convolution, there's usually some other known trick you can use to speed it up. Anybody who has any business working with image processing should know these things; it doesn't take an "expert".

Expert convolutions on the GPU (that work well for both separable/non-separable cases, arbitrary input matrix sizes, and arbitrary kernel sizes) are extremely difficult
CONV2 in Matlab is defined to use "straightforward convolution", with no optimization.
http://www.mathworks.com/help/techdoc/ref/conv2.html
If you are benchmarking CONV2 against a GPU implementation that checks for separability and does other optimizations, your benchmarks are a fraud.
You might say at least your customer are getting a good 2D convolution function for their money and they don't need to know the details. But your customers already get optimized 2D convolutions in Matlab (e.g. FILTER2), so they don't need to pay for your code to realize that speedup, they just need to call a different function.
(And please stop referring to 2D signals as "matrices"; 2D signals and matrices are totally different mathematical objects; the name for the common supertype is "array".)
Re:great! by melonakos · 2011-05-24 06:03 · Score: 1

CONV and FILTER2 both call CONV2 in MATLAB
Re:great! by Anonymous Coward · 2011-05-24 08:24 · Score: 0

You talk like an old man stuck with his ways and refuses change. There are tons of things where having a few hundred slow threads is better than have half a dozen to dozen really fast threads. If you had the time or energy or willingness to take a look the product before making comments like that, may be people will have more respect for that.
Agreed there are instances where a good algorithm is better than performing a parallel version of a bad algorithm. And as for the statement saying "Most people shouldn't be using GPU programming at all because it isn't going to speed up their programs significantly overall.", it signifies your incompetence in understanding parallel algorithms.
GPUs should not be used as boosters. i.e. Just for boosting the slowest part of the code. That is an insane way of doing things for many applications. It may end up increasing the memory transfer overhead. But when you have tools that enable you to perform the entire load of functions on the GPU with a reasonable to really good performance, the overheads of transferring memory are minimal.
And this is what Jacket is trying to be. A huge matrix library on the gpu.
Re:great! by t2t10 · 2011-05-24 20:10 · Score: 1

So? For you to present benchmarks of a CONV2 that detects separability against the built-in CONV2 that explicitly does not use separability is dishonest, because much of the speedup you measure has nothing to do with GPU computing.

Mathematica has this feature, too by tpzahm · 2011-05-23 06:57 · Score: 1

For those with a bent toward Mathematica, GPU computing is baked into Version 8.
There's more information at http://reference.wolfram.com/mathematica/guide/GPUComputing.html

In the spirit of full disclosure, I'm solely a long-time user, not a Wolfram employee.

MAGMA by l00sr · 2011-05-23 07:05 · Score: 3, Interesting

For those interested in an open-source alternative, there's MAGMA, which provides a bunch of linear algebra routines implemented in CUDA. I haven't tried it myself yet, but it looks promising.

Open Computing Language by Anonymous Coward · 2011-05-23 07:20 · Score: 0

Why restrict the accelerated computation to Nvidia GPUs when one could use OpenCL to also exploit the power of AMD GPUs, multi-core CPUs and much more?

cdb read and collate next? by sgt+scrub · 2011-05-23 07:54 · Score: 1

Interesting. i wonder if the GPU could be used to perform functions on large sets in a constant database.

--
Having to work for a living is the root of all evil.

Re:cdb read and collate next? by melonakos · 2011-05-23 08:17 · Score: 1

If you can post some quick code you have in mind, I'll let you know how it might perform using GPUs in MATLAB.
Re:cdb read and collate next? by sgt+scrub · 2011-05-26 12:48 · Score: 1

I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a mapsource));
dstPort = to_string(ntohs(tcp->dest));
pktWin = ntohs(tcp->window);
int flagArray[] = {ntohs(tcp->ack),ntohs(tcp->fin),ntohs(tcp->psh),ntohs(tcp->res1),ntohs(tcp->res2),ntohs(tcp->rst),ntohs(tcp->syn),ntohs(tcp->urg)};
pktFlag = get_flag(flagArray);
cout -1)
{
struct sockaddr_storage srcAdd, dstAdd;
struct sockaddr *daddr_ptr, *saddr_ptr;
saddr_ptr = trace_get_source_address(packet, (struct sockaddr *)&srcAdd);
daddr_ptr = trace_get_destination_address(packet, (struct sockaddr *)&dstAdd);
if (saddr_ptr == NULL | daddr_ptr == NULL)
printf("NULL ");
else
sIP = print_ip(saddr_ptr);
dIP = print_ip(daddr_ptr);
switch(dir)
{
case 0: //inbound
cout " sIP ":" srcPort "|" endl;
break;
case 3: //dunno
cout "DUNNO" endl;
break;
}

--
Having to work for a living is the root of all evil.
Re:cdb read and collate next? by sgt+scrub · 2011-05-26 13:15 · Score: 1

oops. something went wrong. i don't think it is going to jive with /. the post part of the reply should have been:
I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a map of lists called connections. I update the map with the packet information. If the packet is new it gets added. If the packet is a continuation of a conversation the number of packets and amount of data transferred get updated. Regarding the CDB part, every new entry and update completely re-writes the db. Having to update the database on an interface transferring >1G starts to show limitations. Below is pseudocode I hope will help.
start a timer
check packet flag determine if it is new or an update
if syn flag is true new connection start up is true (I also make check for other flags (rst for sure) but this makes it shorter)
then add a list entry for (protocol, source ip, source port, destination ip, destination port, direction indicator, pps, bps) to the map
if syn flag is false and fin flag is false and rst flag is false find the matching entry and increment pps and/or bps
if fin flag is true or rst flag is true add last info and remove pointer to this entry in the map of lists to check for updating information.
if the timer has reached its point move file from memory/write it to disk/create a new CDB

--
Having to work for a living is the root of all evil.

beware: bad benchmarks by t2t10 · 2011-05-23 08:26 · Score: 1

In your benchmarks, you list "1.26 hours" for Canny edge detection on a 4 Mpixel image in Matlab without GPU computing, and you miraculously speed that up to 8 seconds using your GPU tools:

http://www.accelereyes.com/products/benchmarks

On my three year old desktop, using just 1 CPU from a Core 2 Duo, I can do Canny edge detection on a 4 Mpixel PGM image in about 1.7 seconds with straightforward C code (no pointer tricks), including I/O, parsing the PGM, and god knows what else. It's about the same in Python.

So, my conclusions from this are that your company appears to have trouble writing efficient Matlab code (not a good sign for a company making a Matlab extension library), and that instead of spending a lot of time and money on Tesla boards and your GPU extensions to Matlab, people are better off writing their code in Python and use a little bit of C if they have to.

Re:beware: bad benchmarks by melonakos · 2011-05-23 09:08 · Score: 1

Please see my explanation to your other comment on this above. Thanks for pointing out that I need to get our marketing guys to post more information to avoid this confusion. Also, we ship a dozen example with Jacket that you can run to get code and back-to-back comparisons. Hope that helps.
Re:beware: bad benchmarks by melonakos · 2011-05-23 09:11 · Score: 1

Ah, and I should add that for the Python community there is libJacket which will go to v1.0 on Jun 1st. If you want to get early beta access to our Python stuff, email me (email address in my big post above).

Re-code by dominious · 2011-05-23 09:13 · Score: 1

This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA

But you will have to re-code soon when a new version of Matlab is released and functions have changed over and over again! Yes, talking from personal experience...

Lots of GPU-accelerated numerical packages by gupg · 2011-05-23 17:08 · Score: 1

There are tons of other CUDA accelerated numerical packages besides Matlab -- Mathematica, LabView, plugins / wrappers / libraries for Python, R, IDL. Some of these are linked from NVIDIA's website
http://www.nvidia.com/object/numerical-packages.html

Others from
http://www.nvidia.com/object/data_mining_analytics_database.html

Re:r2011a versus jacket convolutions by Anonymous Coward · 2011-05-24 08:10 · Score: 0

I've been using Jacket for a months now to do image and signal processing. It's way ahead of both MATLAB's PCT copy and gpumat.

A lot of misinformation among the comments, so rolled up my sleeves to get some measurements. Wanted to re-evaluate which to use in my work since R2011a just introduced GPU convolution. A quick benchmark of MATLAB R2011a with/without IPP compared to Jacket 1.7.1 for the various convolution functions getting mentioned.
- IMFILTER uses Intel Performance Primitives
- CONV2 non-separable
- CONV2 separable kernels
- FILTER2 tests separability and calls CONV2

Numbers and script below. I used Jacket's TIMEIT function but made a copy and removed GEVAL/GSYNC so you can time MATLAB's stuff too. Have to run Jacket and MATLAB separately, toggle if-statement. Hardware: cpu i7 920, gpu C2070. This is just testing one image (1024x1024) and kernel size (9x9) ... obviously there's a thousand different convolution cases I could benchmark but this here's enough to get a feel.

Some tentative thoughts:

- For MATLAB CPU, enabling IPP had a 10x difference for IMFILTER, but CONV2 was unaffected. I found their CONV2 performance always better than their IMFILTER with IPP enabled. So CONV2 appears to embody their best, most optimized convolution code.

- Neither MATLAB nor Jacket are detecting separability in CONV2 -- wanted to sanity check the benchmark as you suggested.

- For Jacket, separable stuff is 100x faster than IPP as we would hope. But it's even more interesting that it's 6x faster than MATLAB's GPU implementation -- what is MATLAB doing wrong?!?

- For Jacket, IMFILTER was slightly slower than CONV2 -- something they should fix -- a bug? Also IMFILTER interface was a little more cumbersome than CONV2 -- please fix :)

Bottom line: Jacket's GPU slaughters both MATLAB's CPU and GPU implementations. I didn't find any case where MATLAB's stuff came close.

----/begin timings/----
[matlab cpu]
matlab imfilter: 0.015595 sec
matlab conv2: 0.014741 sec
matlab conv2-sep: 0.056838 sec
matlab filter2: 0.057271 sec

[jacket gpu]
jacket imfilter: 0.004392 sec
jacket conv2: 0.004197 sec
jacket conv2-sep: 0.000584 sec
jacket filter2: 0.000647 sec

[matlab gpu]
pct imfilter:
pct conv2: 0.006034 sec
pct conv2-sep: 0.003306 sec
pct filter2: 0.003912 sec

----/begin script/----
The code below requires you change every "Q" to "at symbol" and every "W" to "percent symbol" (/. junk characters)
n = 1024;
k = 9;
a = rand(n,n,'single');
kern_sep = ones(k,1) / k;
kern = kern_sep * kern_sep';

W functions to benchmark
imfilter_fn = Q(a) Q() imfilter(a, kern, 'same');
filter2_fn = Q(a) Q() filter2(kern, a, 'same');
conv2_fn = Q(a) Q() conv2(a, kern, 'same');
conv2_sep_fn = Q(a) Q() conv2(kern_sep, kern_sep, a, 'same');

fprintf('[disable IPP]\n'); iptsetpref('UseIPPL', false)
fprintf('matlab imfilter: W.6f sec\n', timeit_matlab(imfilter_fn(a)))
fprintf('matlab conv2: W.6f sec\n', timeit_matlab(conv2_fn(a)))
fprintf('matlab conv2-sep: W.6f sec\n', timeit_matlab(conv2_sep_fn(a)))
fprintf('matlab filter2: W.6f sec\n', timeit_matlab(filter2_fn(a)))

fprintf('\n[enable IPP]\n'); iptsetpref('UseIPPL', true)
fprintf('matlab imfilter: W.6f sec\n', timeit_matlab(imfilter_fn(a)))
fprintf('matlab conv2: W.6f sec\n', timeit_matlab(conv2_fn(a)))
fprintf('matlab conv2-sep: W.6f sec\n', timeit_matlab(conv2_sep_fn(a)))
fprintf('matlab filter2: W.6f sec\n', timeit_matlab(filter2_fn(a)))

if false W can't run same time .. toggle in separate matlab instances
fprintf('\n[jacket gpu]\n')
a = gsingle(a);
kern = gsingle(kern);
imfilter_fn = Q(a) Q() imfilter(a, kern, 'same');
fprintf('jacket imfilter: W.6f sec\n', ti

Re:r2011a versus jacket convolutions by t2t10 · 2011-05-24 20:37 · Score: 1

It's interesting that you find that Jacket's CONV2 doesn't detect separability, because the guy who runs the company claimed that it did; he was talking about all the "advanced algorithms" that CONV2 contains.

Of course, GPU computing speeds up convolutions; that's what GPUs were designed to do. The questions we have been discussing are the following.

First, is GPU cost-effective for most applications right now and are people going to see the speedups they hope for? In my experience, the answer is "no", because most applications need to do a lot more than convolutions and speedup on other operations is much less. GPU computing is tricky enough that there is a lot more to worry about than whether one inner loop runs faster on the GPU.

Second, should you be using MATLAB for this? The answer there is "no" as well: MATLAB does some things well, but overall it's a badly designed language and environment. There are much better choices available, and many of them open source and with good support for GPU computing.

Third, we were talking about who was taking the lead and who was following, with the Jacket CEO trying to portray his company as leading the way and unpaid open source developers eventually copying their stuff. That is totally wrong. This style of computing was developed in academia long ago. There have been tons of implementations, both commercial and open source, over the years. Jacket is filling one particular market niche for one particular platform.

Having said all that, it sounds to me that Jacket is probably the best GPU computing solution for MATLAB and probably better than anything MATLAB provides. I just don't think that means all that much: most people likely won't see much of a real-world speedup from it, and most people would be far better off kicking the MATLAB habit altogether and moving on to better platforms, open source or otherwise.

ANSYS by i621148 · 2011-05-25 01:43 · Score: 1

ANSYS is also going to able to use GPUs for parallel processing. The crappy part is they charge you 1500 bucks for each HPC license for each processor.

Slashdot Mirror

Matlab Integrates GPU Support For UberMath Computation

89 comments