New Algorithms Improve Image Search
bc90021 writes "Electrical engineers from UC San Diego are making progress on an image search engine that analyzes the images themselves. At the core of this Supervised Multiclass Labeling system is a set of simple yet powerful algorithms developed at UCSD. Once you train the system (the 'supervised' part), you can set it loose on a database of unlabeled images. The system calculates the probability that various objects it has been trained to recognize are present, and labels the images accordingly. After labeling, images can be retrieved via keyword searches. Accuracy of the UCSD system has outpaced that of other content-based image labeling and retrieval systems in the literature. One of the co-authors works at Google, where the researchers have access to image collections at the largest of scales."
I remember when we had to go to a gas station and *buy* porn. Now you have computers out there finding porn for you. You kids today have it too easy!
Snarkiness aside, this is pretty cool stuff. I hope to see usable OSS code in a few years. Imagine how cool it would be to query "show me all pics with my daughter and her rabbits" and have it week through the 1000's of digital family photos.
Method of processing duck feet
change the way I search for Natalie Portman p0rn?
Microsoft: "You've got questions. We've got dancing paperclips."
The probability is either zero or one, because whether or not the feature being sought is present is a state of nature. It would be more helpful to call this number the confidence that the feature is present.
... was similarly trained to recognise tanks in landscapes. I was doing really well - getting a great score on the fresh images it was presented with.
Then they introduced it to a new batch of images and it fell apart.
Turns out that the initial set of images had all the tanks shot on a sunny day and all the tankless images shot on a cloudy day (or vice versa). It had learned to tell a sunny day from a cloudy day.
Ha ha.
I wish the article would mention more about why it is better than similar techniques that have been proposed in the past. (For example, http://luthuli.cs.uiuc.edu/~daf/papers/WAP-fin.pdf seems similar) For instance, where do they get their labels for the training data? A lot of people have tried using contextual words drawn from surrounding web text to limited success due to noise. It's also questionable how well their techniques can do if they need to pre-build a separate classification for each keyword. Finally, there are words that it seems impossible that they could ever distinguish. For example, 'man' vs. 'woman,' would be incredibly complicated for anything but a human. Where are the details? Oh yeah, it's a news story! Here's a link to the paper http://www.svcl.ucsd.edu/publications/journal/2007 /pami/pami07-semantics.pdf
The problem is we all know what's gonna be the first result when searching "Caves on uranus"!!!
--
Great hosting 200GB Storage, 2_TB_ bandwidth, php, mysql, ssh, $7.95
Run this story again when the system can tell the difference between D, DD, and DDD. Bonus points if it can handle "higher" criteria.
Since a huge % (perhaps most) image searches are for porn, it is probably a worthwhile thing for a search server to quickly classify likely porn as a way to reduce search server loading.
Engineering is the art of compromise.
One complaint about this work is that it requires tagging an initial set of images that are needed to train the system. Vasconcelos' work uses the academic standard "Corel" dataset of labeled images but also uses tagged images from Flickr to train the system. Using human computation games like the ESP game for images and ListenGame www.listengame.orgfor audio, collecting data is not as tough as it once was...
It's a little more plausible now that broadband is readily available but this has been portrayed on TV for years. Can you imagine some podunk field office connecting to an FBI database through a dialup and downloading high resolution images until they found just the right one? Then again, that would make for some good entertainment. Detective walks in..."I've got good news and bad news. The good news is we found the killer. The bad news is, he died of old age."