Google's Latest Machine Vision Breakthrough
mikejuk writes "Google Research recently released details of a Machine Vision technique which might bring high power visual recognition to simple desktops and even mobile computers. It claims to be able to recognize 100,000 different types of object within a photo in a few minutes — and there isn't a deep neural network mentioned. It is another example of the direct 'engineering' approach to implementing AI catching up with the biologically inspired techniques. This particular advance is based on converting the usual mask-based filters to a simpler ordinal computation and using hashing to avoid having to do the computation most of the time. The result of the change to the basic algorithm is a speed-up of around 20,000 times, which is astounding. The method was tested on 100,000 object detectors using over a million filters on multiple resolution scalings of the target image, which were all computed in less than 20 seconds using nothing but a single, multi-core machine with 20GB of RAM."
Can it sort and identify duplicates automagically in my porn collection?
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
-"... using nothing but a single, multi-core machine with 20GB of RAM" Phew.. here i was thinking it'd need some unrealisticalll high specs from my PC!!
my cat can spot a Dentabite bag from across the room in 20 milliseconds, does that mean my cat has 20TB of RAM?
So Captcha's will become even easier to crack? Great, the sooner we can get rid of them, the better. As it is they are getting impossible to read by humans, thanks to idiots who don't know how to design them.
Wait, your phone can decode video?!? In real time, playing the movies at normal speed? How many kilograms does it weigh, and how long is the power lead? How big is the mortgage on it? (/socraticmethod)
The computer innovation process broadly goes like this: first algorithm sort-of works but is incredibly inefficient - tweaks on this - a rethinking of the whole approach that leads to massive speed-ups - further refinement - implementation of the algorithm in hardware, where it becomes just another specialized processor - everybody profits!.
This article is about the third, or possibly fourth, phase of the process. If it it works out, phase 5 is straightforward. By itself, step 5 typically leads to two orders of magnitude increase in performance, three orders of magnitude decrease in power consumption, and two to four orders of magnitude decrease in cost.
Phases 6 and 7 happen if and when enough people find the provided service useful. (If technologies are no good, that's when only rich people have them. Successful technologies, everyone gets access to eventually.)
Argh! There is no phase seven. Buffer overflow error.
No but it can spy on you day and night.
Yes, it's a breakthrough. It won the best paper award at this year's Conference on Computer Vision and Pattern Recognition, a tier 1 computer vision conference.
Hashing invarient properties in images isn't new, but,
banded winner-take-all hashing of histograms-of-oriented-gradient part filters and then using matches across those bands to identify a test feature's nearest neighbors, while simultaneously computing an upper bound or exact dot products of those test features with their nearest learned features, for up to 100,000 objects with small amounts of memory, is new.
Phase 7 is profits. You obviously assumed phase 6 was "???".
Some years ago, I had an idea for a tool that would, in a nutshell, identify a plant simply from a photo and some metadata (time of year, geolocation, etc). I know how it would work (and it would work), but I came to the conclusion that someone (ie. Google) would use the methods to develop a tool that would do the same thing but for human faces.
It was at that point I decided to leave that box closed.
Someone flopped a steamer in the gene pool.
There are several non-too-creepy apps that can identify plant species by a smartphone-photo of a single leaf.
http://leafsnap.com/about/
They seem to request metadata directly via your phone's location and time-of-request (their server, not your phone, does the pattern-matching). Which is convenient, although it may place you at a time and place you may rather not be placed, for instance if burying pirate gold under a particular tree.
"...might bring high power visual recognition to simple desktops and even mobile computers... computed in less than 20 seconds using nothing but a single, multi-core machine with 20GB of RAM."
Right... and by mobile computers you mean computers that I can lug from one desk to another.
Like the MacBook Pro Retina with 16 GB? The point of their approach seems to be lots and lots of RAM to do table lookups. The memory subsystem in a normal laptop is plenty fast for that. Bandwidth would be more of a problem than total space in a cellphone. If we had a compelling case for loads of RAM in a smartphone, it would be possible to design one without going wildly beyond current power or cost envelopes. A few more years of Moore and things will be fine.
20GB per 100000 objects is 209kB per object. Don't know what resolution each image was, but I think 200kB is quite small.
It would be nice if it could identify bird species (or other animals) preferably up to specific individual animals, like they do it with whales and penguins already. .45, 0.23 miles in that direction, so there is still hope.
I'd gladly pay money for such a program instead of getting only a free version, where I can check if aunt Mary with a drink in hand is in any photo in my collection.
We have already been waiting for years to get a program that can identify bird songs after shazaa came out, no luck yet, but hey, after all many towns have already a program that tells them: Somebody shot somebody with a
Current mobile seems to cap out at 2MB of RAM. There is a reason for this - power consumption. RAM requires a continuous trickle of power to maintain state. An increase in RAM leads to a direct increase in power consumption. Mobile improvements are going to be focused on power consumption rather than raw power. Moore's law will be followed, but it will not result in something that is 2x more RAM, it will result in something that is 2x less power drain. Ok, I will grant you that it will probably be a mix - some increase in RAM, some increase in computation, but a significant increase in battery life.
To go from 2GB to 30GB following Moore's law would take 8 years. I contend that it will take longer than that because we won't see exact doubling of specs due to improvements in power. Either way, 10 years is far enough out that I think the summary claiming that this will come to mobile is far fetched for now.
Surely you realize the video decoding on phones is done with dedicated hardware.
You could do it on the CPU though, the latest models (Galaxy S4 and all) should be powerful enough.