Google's Latest Machine Vision Breakthrough
mikejuk writes "Google Research recently released details of a Machine Vision technique which might bring high power visual recognition to simple desktops and even mobile computers. It claims to be able to recognize 100,000 different types of object within a photo in a few minutes — and there isn't a deep neural network mentioned. It is another example of the direct 'engineering' approach to implementing AI catching up with the biologically inspired techniques. This particular advance is based on converting the usual mask-based filters to a simpler ordinal computation and using hashing to avoid having to do the computation most of the time. The result of the change to the basic algorithm is a speed-up of around 20,000 times, which is astounding. The method was tested on 100,000 object detectors using over a million filters on multiple resolution scalings of the target image, which were all computed in less than 20 seconds using nothing but a single, multi-core machine with 20GB of RAM."
Can it sort and identify duplicates automagically in my porn collection?
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
-"... using nothing but a single, multi-core machine with 20GB of RAM" Phew.. here i was thinking it'd need some unrealisticalll high specs from my PC!!
"...might bring high power visual recognition to simple desktops and even mobile computers... computed in less than 20 seconds using nothing but a single, multi-core machine with 20GB of RAM."
Right... and by mobile computers you mean computers that I can lug from one desk to another.
my cat can spot a Dentabite bag from across the room in 20 milliseconds, does that mean my cat has 20TB of RAM?
... or Sarah Connor for that matter?
So Captcha's will become even easier to crack? Great, the sooner we can get rid of them, the better. As it is they are getting impossible to read by humans, thanks to idiots who don't know how to design them.
Is this really a breakthrough? Hashing of invariant properties in images isn't new.
This algorithm definitely goes into the next release of drone firmware.
20GB per 100000 objects is 209kB per object. Don't know what resolution each image was, but I think 200kB is quite small.
There was a time, and not so very long ago, when I was always very keen to hear what interesting thing Google had just invented, and excited to see what they'd do with it. Now my initial reaction to everything they do is "great, how are they going to use this to mess with me"?
The cake, however, ...
BMW has a forward facing camera under the rear view mirror that scans for highway signs for posted speed limit and no-passing signs and displays them on the dash. I am not it is basic car or you have to buy some advanced tech package for it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
It would be nice if it could identify bird species (or other animals) preferably up to specific individual animals, like they do it with whales and penguins already. .45, 0.23 miles in that direction, so there is still hope.
I'd gladly pay money for such a program instead of getting only a free version, where I can check if aunt Mary with a drink in hand is in any photo in my collection.
We have already been waiting for years to get a program that can identify bird songs after shazaa came out, no luck yet, but hey, after all many towns have already a program that tells them: Somebody shot somebody with a
I was actually on that CVPR.
Yes it did get the best paper award.
The only question it got during the oral presentation was
"So what exactly is your contribution with this paper?"
Seriously folks. It has been done before. They just did an efficient hashing and that's all.
It is always dubious when a gold sponsor of the conference gets the best paper award for a trivial research work
You have been tagged at the ATM
You have been tagged at the laundromat
You have been tagged at the Quickie Mart
You have been tagged at work
You have been tagged at the gym
You have been tagged.
Its fast, but training set is random garbage from YT thumbnails and they have NO PROCESS to assess accuracy. All they can do is measure precision and its ~16% on average. What this means is their algorithm could very well just say FACE every single time and by shear coincidence every sixth image in dataset contains some face - tada, you just reached 16% precision.
Who logs in to gdm? Not I, said the duck.
The page link -links to just the technical supplement
(very brief)
Here is the full link from one of the author pages.
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40814.pdf
Also kudos to google for publishing it to the public, aaron schwartz would be proud.
Being a software engineer myself I understand the sense of excitement accomplishment after completing internal testing. But as with many projects, as soon as this leaves the controlled "lab testing" environment it's a whole different ball game. Until then it's still a white paper product and I'd suggest remaining cautiously optimistic...
Bear in mind, this particular method is just a way to quickly do a large number of convolutions and get statistically fairly accurate results for the most activated convolution kernels.
This isn't incompatible with deep neural network models. This method can be combined with them and provide the same speedup there.
My spoon is too big.
I'm sure it would take me more than a few minutes to identify that many objects.
However, how fast can it find Waldo?
It probably has been well tested "in the real world" - check out Google Goggles sometime (which is available for Android and iOS).
In fact, this probably came out of the stuff that Goggles does - where you snap a photo and Goggles figures out what's in it. If you snap a QR code, it'll decode it, a barcode, it'll pop up a Google search for that product. Other items it'll attempt to either OCR it or perform object recognition. Basically it gives a list of things (snap a sign and it'll probably try to OCR it, offer you a translation, tell you what kinds of cars it sees, etc).
- First they ignore you, then they laugh at you, then ???, then profit.
To make the Kinect work (version 1.0) Microsoft gathered thousands upon thousands, possibly millions of data points, processed the images, checked the results etc. and after zillions of computations ended with digested data and some algorithms that use it, giving an accurate result in real time.
From reading the abstract I'm under the impression that Google basically did the same thing ; it's trading computation for memory use. The "hashes" of what the camera see match somehow with the digested data they amassed and thus the object gets classified. They do mention the training data.
represents a speed-up of approximately 20,000 times - four orders of magnitude - when compared with performing the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
I can't comment further on this, dunno if that new Google thing is basically/fundementally the same concept used in the Kinect or if there are relevant differences, other than scale.