An Advance In Image Recognition Software
Roland Piquepaille alerts us to work by US and Israeli researchers who have developed software that can identify the subject of an image characterized using only 256 to 1024 bits of data. The researchers said this "could lead to great advances in the automated identification of online images and, ultimately, provide a basis for computers to see like humans do." As an example, they've picked up about 13 million images from the Web and stored them in a searchable database of just 600 MB, making it possible to search for similar pictures through millions of images in less than a second on a typical PC. The lead researcher, MIT's Antonio Torralba, will be presenting the research next month at a conference on Computer Vision and Pattern Recognition.
I hate reading press releases of reading papers with real explanations of what's going on.
I just finished reading "Small Codes and Large Image Databases for Recognition" written by the guy. All he did was implemented Geoff Hinton's idea of databasing images with a binarized coefficients produced by Restricted Boltzmann Machines.
Hinton himself gave a talk on it for Google here:
http://www.youtube.com/watch?v=AyzOUbkUf3M
Actually I'm wondering, is he plagiarizing Hinton?
-- Making computers see, hear, and think... http://www.componica.com/
Incorrect. First of all, in a CAPTCHA, you're trying to very rigorously inspect a single image. This advance seems to be more about taking quick glances at lots of images. Furthermore, in the article, they talk about recognizing flowers and cars. The fact is, computers already have no problem recognizing letters and numbers in images. We got that down a long time ago. The difficult things about reading a CAPTCHA image are removing distortion and splitting the whole image into the component characters. If you read the article, you'd see that this research has nothing to do with that.
The actual paper is at http://people.csail.mit.edu/torralba/publications/nipsRecognitionBySceneAlignment.pdf
From what I can tell, it's basically, "blur the image down to only a few hundred pixels and then you have less data to comb through!"
What you're asking for is ill-defined, but much sought after.
A reasonable descriptor which produces distances that seem somewhat correlated with human perception would indeed be Antonio Torralba and Aude Oliva's gist descriptor.
http://people.csail.mit.edu/torralba/code/spatialenvelope/
It's become quite popular in computer vision and computer graphics for scene matching.
Read the papers then
http://people.csail.mit.edu/torralba/tinyimages/
There are all kinds of ways, but two simple ones come to mind. If you convert to a polar coordinate system the power spectrum is conveniently orientation independent. You can use the same trick with a shift: the power spectrum of a Cartesian coordinate system is shift independent.
Another way is to somehow identify the orientation. An simple way to do that is to find the axis along which there's maximum variation and rotate until those axes match in both images.
Pixel by pixel co-registration basically does look at a similarity measure for a lot of variations on the affine transform. You generally don't have to look at them all though: you use an iterative algorithm with a clever optimization strategy so your transform gets better and better instead of searching through the parameter space randomly.