Searching with Images instead of Words
johnsee writes "A computer vision researcher by the name of Hartmut Neven is developing ingenious new technology that allows the searching of a database by submitting an image, for example, off a mobile phone camera. Imagine taking a photo of a street corner to find out where you are, or the photo of a city building to see its history"
Tell me, which is easier? Upload this image and try to find out where you are via this Visual Google, or enter the street name (street sign in the photo says "Queen Street") in Text Google?
The article also mentioned this thing should start small, like a movie guide, so is it easier to upload a 2K "I,Robot" billboard photo, or just enter "I,Robot" in Google on your cell phone?
As long as human input is still required (i.e. you need to submit something), I don't think this is going to be popular. However, if you have a Oakley that automatically takes photos of what you see and feeds you the location details, that'll be something.
Rock that crushes, Paper & Scissors that don't matter.
the pr0n industry is going to love this.
sulli
RTFJ.
how much harder is it to just use a regular text search for the restaraunt, movie, building, etc. that you want info on? It's like voice dialing on a cell phone, good idea, but it's about ten times faster and more effective to either dial or scroll to the name you want to call manually.
Or taking a picture of someone and finding out their history.
click
"Whoa Dude!, she's been on 4 amature Pr0n sites!"
There are 01 kinds of cars in the world. The General Lee, and everything else.
Or why not just look at the street signs to find out where you are? If the street corner is in a database it is probably in an area that is developed enough to have street signs.
Yes, imagine that.
1: Take picture with ultra-modern all-features camera phone of building while lost in city.
2: Submit to search system.
3: Search system queries phone's built-in GPS for position information.
4: Search system sends back retrieved GPS location.
5: Customer is absolutely blown away and immediately sends back picture of self signing virtual 10-year contract at Early Adopter prices.
6: Profit!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
...are gonna love this too. Take a picture of the girl you like and do a search. This has some scary connotations I'm afraid.
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
This seems less like a technology article and more like an advertisement for Hartmut Neven himself. Yes, he's built a 'google for images'... But how does it perform? How exactly is it 'ingenious'? What sets his project apart from the handful of people at almost every University with a Computer Vision research department that is tackling the problem. The problem of matching images is well known, and very difficult to solve. Even in my grad school (BU), which has a small number of computer vision grad students, there are two different research projects on this very topic.
The database to work will have to understand what 3D objects are (at least in the specific domain) and have an idea of what features of the object are important (like signs for example, so it will need a very good OCR system then too). That becomes a knowledge representation issue.
There have been many projects like this before attempted. But until a computer knows what a "chair" is, or what "statue" or a "tree" is, it will just not work right. To have a computer understand concepts though is a much larger and more interesting accomplishment.
When your brain 'recognises' what it is looking at, it is doing a lot more than just comparing two images (as in the street-corner example from the article). Your brain simply doesnt operate in terms of bitmaps.
The fact that he is basing his hyper-vaporous product on facial-recognition software should set of alarm bells. Facial-recognition in a real-world context has consistently failed to be of any use at all, although it may work fine under lab conditions.
If all the money invested so far hasn't made a computer able to successfully recognise a subset of the visual field (faces), why should I believe in a machine that is able to recognise practically anything?
GPS would be useful in some situations (if you want to know about a general area), but for the example of taking a "photo of a city building to see its history", GPS itself would not be sufficient.
GPS can provide a location, but it can't pinpoint what you are looking at. This is the case even with compass data indicating which direction you are pointing your device--what if there are two things in your line of site from that perspective? (Do you want information about the building, or do you want information on the kiosk in front of the building?)
Also, this is more generic than these examples anyway. What if I want information on the building and then on a street performer in front of the building? I could take a picture of the building, read about it, then take a picture of the street performer and read about him/her. GPS wouldn't be sufficient to tell me about the street performer, because he or she might move around the area.
This paper demonstrates an automatic system for telling whether there are naked people present in an image.
So it's not "identifying a person by a nude picture", it's identifying pictures which contain nude people...
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F