Google's Latest Machine Vision Breakthrough
mikejuk writes "Google Research recently released details of a Machine Vision technique which might bring high power visual recognition to simple desktops and even mobile computers. It claims to be able to recognize 100,000 different types of object within a photo in a few minutes — and there isn't a deep neural network mentioned. It is another example of the direct 'engineering' approach to implementing AI catching up with the biologically inspired techniques. This particular advance is based on converting the usual mask-based filters to a simpler ordinal computation and using hashing to avoid having to do the computation most of the time. The result of the change to the basic algorithm is a speed-up of around 20,000 times, which is astounding. The method was tested on 100,000 object detectors using over a million filters on multiple resolution scalings of the target image, which were all computed in less than 20 seconds using nothing but a single, multi-core machine with 20GB of RAM."
Can it sort and identify duplicates automagically in my porn collection?
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
-"... using nothing but a single, multi-core machine with 20GB of RAM" Phew.. here i was thinking it'd need some unrealisticalll high specs from my PC!!
my cat can spot a Dentabite bag from across the room in 20 milliseconds, does that mean my cat has 20TB of RAM?
Wait, your phone can decode video?!? In real time, playing the movies at normal speed? How many kilograms does it weigh, and how long is the power lead? How big is the mortgage on it? (/socraticmethod)
The computer innovation process broadly goes like this: first algorithm sort-of works but is incredibly inefficient - tweaks on this - a rethinking of the whole approach that leads to massive speed-ups - further refinement - implementation of the algorithm in hardware, where it becomes just another specialized processor - everybody profits!.
This article is about the third, or possibly fourth, phase of the process. If it it works out, phase 5 is straightforward. By itself, step 5 typically leads to two orders of magnitude increase in performance, three orders of magnitude decrease in power consumption, and two to four orders of magnitude decrease in cost.
Phases 6 and 7 happen if and when enough people find the provided service useful. (If technologies are no good, that's when only rich people have them. Successful technologies, everyone gets access to eventually.)
Yes, it's a breakthrough. It won the best paper award at this year's Conference on Computer Vision and Pattern Recognition, a tier 1 computer vision conference.
Hashing invarient properties in images isn't new, but,
banded winner-take-all hashing of histograms-of-oriented-gradient part filters and then using matches across those bands to identify a test feature's nearest neighbors, while simultaneously computing an upper bound or exact dot products of those test features with their nearest learned features, for up to 100,000 objects with small amounts of memory, is new.
Phase 7 is profits. You obviously assumed phase 6 was "???".
Some years ago, I had an idea for a tool that would, in a nutshell, identify a plant simply from a photo and some metadata (time of year, geolocation, etc). I know how it would work (and it would work), but I came to the conclusion that someone (ie. Google) would use the methods to develop a tool that would do the same thing but for human faces.
It was at that point I decided to leave that box closed.
Someone flopped a steamer in the gene pool.
There are several non-too-creepy apps that can identify plant species by a smartphone-photo of a single leaf.
http://leafsnap.com/about/
They seem to request metadata directly via your phone's location and time-of-request (their server, not your phone, does the pattern-matching). Which is convenient, although it may place you at a time and place you may rather not be placed, for instance if burying pirate gold under a particular tree.
So Captcha's will become even easier to crack? Great, the sooner we can get rid of them, the better. As it is they are getting impossible to read by humans, thanks to idiots who don't know how to design them.
But there's no need to get rid of them if we'll all have a handy browser plugin that can decode them for us at the press of a button!
Current mobile seems to cap out at 2MB of RAM. There is a reason for this - power consumption. RAM requires a continuous trickle of power to maintain state. An increase in RAM leads to a direct increase in power consumption. Mobile improvements are going to be focused on power consumption rather than raw power. Moore's law will be followed, but it will not result in something that is 2x more RAM, it will result in something that is 2x less power drain. Ok, I will grant you that it will probably be a mix - some increase in RAM, some increase in computation, but a significant increase in battery life.
To go from 2GB to 30GB following Moore's law would take 8 years. I contend that it will take longer than that because we won't see exact doubling of specs due to improvements in power. Either way, 10 years is far enough out that I think the summary claiming that this will come to mobile is far fetched for now.