New Algorithms Improve Image Search
bc90021 writes "Electrical engineers from UC San Diego are making progress on an image search engine that analyzes the images themselves. At the core of this Supervised Multiclass Labeling system is a set of simple yet powerful algorithms developed at UCSD. Once you train the system (the 'supervised' part), you can set it loose on a database of unlabeled images. The system calculates the probability that various objects it has been trained to recognize are present, and labels the images accordingly. After labeling, images can be retrieved via keyword searches. Accuracy of the UCSD system has outpaced that of other content-based image labeling and retrieval systems in the literature. One of the co-authors works at Google, where the researchers have access to image collections at the largest of scales."
I remember when we had to go to a gas station and *buy* porn. Now you have computers out there finding porn for you. You kids today have it too easy!
Snarkiness aside, this is pretty cool stuff. I hope to see usable OSS code in a few years. Imagine how cool it would be to query "show me all pics with my daughter and her rabbits" and have it week through the 1000's of digital family photos.
Method of processing duck feet
change the way I search for Natalie Portman p0rn?
Microsoft: "You've got questions. We've got dancing paperclips."
The probability is either zero or one, because whether or not the feature being sought is present is a state of nature. It would be more helpful to call this number the confidence that the feature is present.
... was similarly trained to recognise tanks in landscapes. I was doing really well - getting a great score on the fresh images it was presented with.
Then they introduced it to a new batch of images and it fell apart.
Turns out that the initial set of images had all the tanks shot on a sunny day and all the tankless images shot on a cloudy day (or vice versa). It had learned to tell a sunny day from a cloudy day.
Ha ha.
The problem is we all know what's gonna be the first result when searching "Caves on uranus"!!!
--
Great hosting 200GB Storage, 2_TB_ bandwidth, php, mysql, ssh, $7.95
Run this story again when the system can tell the difference between D, DD, and DDD. Bonus points if it can handle "higher" criteria.
Since a huge % (perhaps most) image searches are for porn, it is probably a worthwhile thing for a search server to quickly classify likely porn as a way to reduce search server loading.
Engineering is the art of compromise.
I find it interesting which ones of the object-recognition and scene categorization algorithms make it to Slashdot.
m
Why does this one make it?
This is a very hot research topic at the moment.
to name a couple of groups:
http://www.robots.ox.ac.uk/~vgg/
http://lear.inrialpes.fr/
http://www.vision.caltech.edu/
http://www.science.uva.nl/research/isla/
http://www.cdvp.dcu.ie/
http://www.informedia.cs.cmu.edu/
http://www.research.ibm.com/slam/
http://www.ee.columbia.edu/ln/dvmm/newResearch.ht
oh, and people should not stare themselves blind on the claimed results.
Research papers *always* have to present good results, or else you do not get published.
Furthermore, these images are of a very high quality, make by professional photographers.
Many algorithms perform very well on these ('corel'-like) sets, while utterly failing if applied on real-world data:
http://www-nlpir.nist.gov/projects/trecvid/