Google VisualRank for Image Search
Google researchers are claiming that a newly developed approach to visual search may do for image searching what PageRank did for text search. "The research paper, 'PageRank for Product Image Search,' is focused on a subset of the images that the giant search engine has cataloged because of the tremendous computing costs required to analyze and compare digital images. To do this for all of the images indexed by the search engine would be impractical, the researchers said. Google does not disclose how many images it has cataloged, but it asserts that its Google Image Search is the 'most comprehensive image search on the Web.'"
Namaste
It should be noted that a lot of the prelim data for this was gained through human interaction that google setup as a game.
I am still playing with the filter by date dropdown url manipulation.
Sweet!!! More exact pr()n searches!!! Wohooo!!!!
Does anyone have the full name/DOI of the paper?
(which you know they didn't)
Talking about another uses... what about putting that techniques and the "enormous computing power" to some useful (for the society) jobs? It can be used to find mineral ores (maybe correlating aerial images with geological data?) or medical analisys (skin cancer? tissue identification?). It wouldn't give much direct economical revenue, but it will surely increases the Google "coolness" a lot (and from a shareholder point of view, it can be very very attractive)
here's the result of cowboy neal: http://images.google.com/images?gbv=2&hl=en&safe=off&q=cowboy+neal&btnG=Search+Images
G(.)(.)GLE? ...
"Filter error: Your comment looks too much like ascii art", you say? No kidding...
Which is a good point. Sometimes you don't want the text associated with the image, you want the image itself.
The canonical example would be image macros and comic strips. When you're looking for a particular LOLcat or demotivational poster, or even a specific comic strip based on a remembered punchline, the text in the image is what you want to be able to search for. The text associated with the images (that is, the HTML from whatever poster first showed it to you in some gaming forum thread or 'blog) is irrelevant.
It doesn't solve the broader problem, but it'd be a good starting point. Given that the fonts chosen for image macros and comic strips are designed for readability, standard OCR techniques would work. If machines can solve CAPTCHAs, Google can certainly index the text on images.
All of a sudden, every comic strip you ever remember reading as a kid (even if you've only got one or two pages of it) becomes searchable.
Is this the same technology that google recently indicated will help "Crack down" on child porn? Or is this yet another different form of doing the search? And if it is different does anyone know if they have plans to put these two technologies+existing methods together to make the engine even more robust?
I don't expect an answer... but who knows maybe one of the goog guys that are in the know are reading.
Here's some background info on the guy: http://en.wikipedia.org/wiki/Mr._Magoo
"Quote me as saying I was mis-quoted." -Groucho Marx
Let me guess, you had a chance to get in on the IPO when it was 40 bucks a share? And you turned it down, and said "40 dollars a share is way overvalued for a search company".
Prediction: The real iPhone killer is going to be sex robots from Japan. Think about it.
Image search this and that, sure, but why the hell is it still next to impossible to find product reviews using Google? Every time I try I only get product pages in online shops and not a single "real" review.
Maybe Google does something like this already, but I was thinking...
Can't they tune their image search by matching what results for particular terms are clicked? Presumably, the images people click on are more apt to be accurately described by the search terms originally entered, so that's like a constant 'free' image classification going on constantly.
For instance, if I put in "green field", I might get a bunch of images, and click on one that shows a grassy prairie. That image could be tagged with the keywords 'green' and 'field', probably weighted so that it takes multiple taggings to influence search results.
iPod: look for lots of shiny white
Zune: look for lots of brown
Xbox 360: look for red dots in a ring
Here's an idea for Google that's been on my mind for several months. Yes, I'm giving it out for free.
Let me upload an image in my hard drive to Google and have them check it against the zillion images on their catalog. Then give me a page with all the similar copies it could find, with a thumbnail and the URL from where it originates.
One practical use I can think of: Someone you meet on the web sends you a photo claiming to be of him/herself. With this Google utility, you could upload that same image and have Google tell you if it exists anywhere on the web. Then you'd know if this person just took it off a MySpace profile, etc.
Another practical use: Look for prior art or copyright violations on images someone claim is original work. Could be very useful for Wikipedia.
The potential for something like this is massive.
Err... No, no... The AC wasn't but the situation you described was me. I could have even afforded quite a chunk. I'm not entirely certain of the percentages but I'm pretty sure I invested in beer far more than I should have. Yes, yes I kick myself.
"So long and thanks for all the fish."
GSOC?
I think that the problem with what the GP suggests are 2 fold
1) I imagine, analysing aerial images is much harder than your typical photo
2) Medical analysis would require access to a lot of data, and people already have enough googlefoil hats
IranAir Flight 655 never forget!
Hail Google! Self-proclaimed King of Everything! Go ye forth and Do No Evil (except in Russia and China -- oh, yeah, and that bit about caving to the Feds on your users' personal data)!
(NO WAI)
Contrary to the popular belief, there indeed is no God.
Is it just me or has anybody else noticed that Google doesn't make much effort to catalog the photos on Flickr, which is incidentally owned by Yahoo.
Or is it that Yahoo is blocking Google?????
Needless to say, if you search for a restricted set in Yahoo image search, you will pull up all of the Flickr photos. The same search in Google will often yield nothing from Flickr.
Filmo The Klown
I had basically the same idea, but I was going to keep the colour information, blur, include a global colour/contrast value (obtained by resampling to "1x1"), use that to colour-correct the image, and then resample to maybe 5x5.
I figured that for web searches, that should probably be good enough to find lots of alternative images from the same photoshoot or photoset as a sample picture, pictures taken by other photographers of the same scene, or still images of the same scene from a movie.
If you relaxed the search dependency on the global colour value, you should be able to find differently-processed versions of the same image. I was also going to strip the edge pixels (to remove borders). The use of colour-correction would mean that you wouldn't be wasting code resolution on data that was the same in each cell, and you could identify different version of the same image with different colour "casts", or where operators had played about with the contrast.
It was going to need a bit of R&D and a decent library of sample pictures to work out the best tradeoff between recognisability, color resolution per cell, and final number of cells ("5x5"?, "7x7"?), and once you'd done //that//, there'd probably want to be further R&D to consider possible ways of optimising and future-proofing the "ID code" file structure.
For instance, you might want to put the global colour value first (for finding "exact-match" candidates), followed by the centre cell value, then the surrounding cells ordered by proximity to the centre, based on the assumption that edge information is likely to be less important (if it turned out that the four corner cells weren't too helpful, your search algorithm could ignore the last four cell values).
And then you have issues over whether the number of cells in the ID ought to be fixed or variable, and if its variable, whether it should be recursive. You might base the system on "5x5", but allow the possibility of appended secondary ID code data that subdivides each cell and uses lower-res relative colour offset values for the sub-cells, based on the parent cell's value. Or you might not. It'd be up to the algorithm whether it was going to clip, degrade or weight cell values depending on their ordering.
If you were going along the "recursive" route, then instead of starting with a cell grid that then subdivides, you might divide the grid into concentric zones based on the picture's centre, for instance with a 5x5 grid, you could individually code the relative colours of the three zones: the central cell, the average of all the edge cells, and the average of all the intermediate cells. That'd give three codes that'd describe some of the colour variation across the image without being too sensitive to whether a human subject had moved an arm or leg between shots.
With more processing power you could start moving away from a square grid, and use proper concentric circular zones that are then subdivided, or a pattern of "test blobs" for sampling that approximate subdivided concentric zones (using pre-made bitmask templates for speed), but to start with, a grid approach would probably be more straightforward and simpler to execute.
One of those interesting projects I never got around to following through on ... :(
Eric Baird