Finding Yourself With Photo Recognition
itchyfish writes "You are lost in a foreign city, you don't speak the language and you are late for your meeting. What do you do? Take out your cellphone, photograph the nearest building and press send.
For a small fee, photo recognition software on a remote server works out precisely where you are, and sends back directions that will get you to your destination.
Seems a little far fetched, but amazingly cool if it really works."
The software then looks for useful features, such as the corners of windows and doors, and extracts the colours and intensities of the pixels around them. Next, it searches the image database for matching data, using the base station the cellphone's signal came from as a guide. Finally, it uses the differences between the two images to calculate the photographer's position.
To me, it would appear that an easier solution might be to use GIS data in combination with the cell phone signal and comparisons of rough morphological features of buildings. The instructions should simply be: Point your camera at a building near you so that you can approximate its outline and then send that image. This would scale much larger than the methods referenced in the article as you would not have to store every detail of the buildings surrounding you including pixel maps of textures and color. This approach could be handled for a large city by a few commodity servers whereas the other approach would require significantly more computational resources.
Imagine how difficult it would be to capture details like that in a major city such as NYC? I don't really need directions to find my way around Cambridge city center as you could almost throw a rock from the center and hit just about every building around, but London, Washington, Houston etc... are another story and the data required from their approach would require massive computational infrastructure.
Visit Jonesblog and say hello.
I think you missed the part, where it pulls your relative location off of the tower you're using. -PHiZ
Pretend I said something meaningful or insightful here.
Even old-style degraded GPS let you get your position to within a few hundred feet. Assuming that a map was returned with a 400 sq ft circle instead of a "you are exactly here" any half intelligent person could figure it out.
Anyway most cellphone networks can triangulate your position to within a block.
As for photo recognition being MORE accurate, i cant see how. To get your position to within a few hundred feet you'd need to know the exact parameterization of the lens, the zoom, the angle of the camera... unlikely.
Getting GPRS to work correctly in a foreign country so that you can make such a request is hard enough to begin with.
Q: You are lost in a foreign city, you don't speak the language and you are late for your meeting. What do you do?
A: Do just like all of the other PHBs who were stupid enough to get stuck like that, i.e. screw the meeting, find the nearest bar, and start blowing the company expense account on cheap booze and hookers.
In Soviet Russia, Chuck Norris will still kick your ass.
This seems like an overly complicated solution. At the moment, my phone in Japan has a feature where I logon to Vodafone's website (from the phone) and click through a couple of links and then it tells me where I am. I assume it gets this information by figuring out which cell the phone is dialing from. From the subsequent menus, there are various options like "find the last train to station X", "find the nearest place to catch a taxi", and so on. A few months ago it was only available in Japanese, but now they've introduced a bilingual version - hoochie mamma.
Why bother using the fancy-dancy image recognition software when cellular telephony has a built-in system that basically acts like a constantly-updated "user location" variable?
(Actually, the answer is simple - to make geeks foam at the mouth. Come on now people!! Excess ain't rebellion.)
--
...Whether my Maker is prepared for the great ordeal of meeting me is another matter.
Churchill
2. GPS has only 10m accuracy. This is important when you're giving pedestrians directions (eg cross the street and enter the second door on your right).
And how will this improve on 10m accuracy? Will you have to submit your camera lens's focal length as well in order to determine the distance from the photographed objects? Humans generally can't tell the difference between a 20mm lens photographing at 40m vs. a 35mm lens at 70m but this software can supposedly get 1m accuracy levels? I very much doubt this.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
You would clearly have a library of objects (e.g. buildings) on the servers. When a picture is sent, the service would perform some sort of feature extraction, and calculate the invariants of the objects in the scene. It would then see if these objects nearly matched any in the database. If they did, it would project possible matches onto the image and look for edges around the model. If there was good correlation (accepting the fact that the match would not be perfect because of moveable objects) it would return the name of the building.
Prof. Cipolla lectures me on (suprise, surprise...) Computer Vision. You can find his lecture handouts here. (the projection handout, page 46 onwards talks about the process I have just described.)
Dear All,
Wow! Thank you very much for all your comments on this mobile phone navigation system. I thought I'd throw in my 2 cents worth since I'm one of the people who invented it! Forgive the lack of structure in what follows, but I'm trying to address several different issues raised throughout this discussion...
Yes, another way of doing this is radio signal triangulation (including GPS). But actually, this method doesn't work too well in cities because of things like multipath effects and satellite visibility (BTW our system isn't designed to work outside urban environments). GPS car navigation systems rely on a combination of GPS and inertial sensors, i.e. they take a sort of average of a large number of inaccurate readings to get a good fix on position. But the simpler positioning strategies are unlikely to give good enough acccuracy to establish on which side of the street you are standing (and in any case, they don't tell you whhat direction you're looking in). GPS is also expensive: most people would not be prepared to pay more for a phone with an in-built GPS receiver - but camera phones are already selling well.
No, we're not going to build a database of every building in the world! But a good place to start would be large city centres. FYI what motivated us to invent this system was the familiar problem of getting lost outside London tube stations. Obviously I know which tube station I'm at but I don't usually know which exit I took or what direction I'm facing. Of course I can retrieve a local map via my mobile phone. But the problem is I'm missing that critical "you are here" dot that tells me where to start. This is where our system comes in: by providing the dot (well, an arrow actually because it tells you which direction you're looking in too).
In practice, builing a database is easier than you might think. Probably we could do it with nothing more than a video camera attached to a car. Granted someone will have to drive down the streets of interest but only once (and this shouldn't be too difficult in somewhere like New York).
Finally, no, movable objects don't cause too many problems. The system uses a feature based strategy that is robust to 'clutter' in the form of things like cars, pedestrians, changing shop window displays, etc. That being said, there will always be ways of confusing it, e.g. by demolishing a building. But supposing that picture messages will one day cost as little as text messages do now, a system that works almost instantaneously and gets it right 99% of the time sounds as if it might have some commercial potential at least. And what if the hypothetical tourist isn't lost but just interested? For example, the system could return information about the history of any building of interest in the middle of Venice.
Yours,
Duncan Robertson