Google VisualRank for Image Search

← Back to Stories (view on slashdot.org)

Google VisualRank for Image Search

Posted by ScuttleMonkey on Monday April 28, 2008 @09:20AM from the they-don't-quite-own-the-world-yet dept.

Google researchers are claiming that a newly developed approach to visual search may do for image searching what PageRank did for text search. "The research paper, 'PageRank for Product Image Search,' is focused on a subset of the images that the giant search engine has cataloged because of the tremendous computing costs required to analyze and compare digital images. To do this for all of the images indexed by the search engine would be impractical, the researchers said. Google does not disclose how many images it has cataloged, but it asserts that its Google Image Search is the 'most comprehensive image search on the Web.'"

63 comments

Min score:

Reason:

Sort:

doesn't work well by planckscale · 2008-04-28 09:26 · Score: 2, Funny

Still no positive results for ["Natalie Portman" and "Hot Grits"]

--
Namaste
image game data by BoldAC · 2008-04-28 09:27 · Score: 2, Interesting

It should be noted that a lot of the prelim data for this was gained through human interaction that google setup as a game.

I am still playing with the filter by date dropdown url manipulation.
1. Re:image game data by Anonymous Coward · 2008-04-28 09:34 · Score: 0
  
  Woah, I am totally addicted to the google image labeler. I am shocked how much time people must volunteer to google using this little gaming thing. Look at the leader...
  
  Ciao 4 Now 22980040
  
  Considering a great match might be between 50-150 points, that dude (or dudette) has spent some _serious_ time on google image labeler. Gesh! And I thought I was wasting my time on slashdot.
  
  Anybody want to play?
2. Re:image game data by BoldAC · 2008-04-28 09:38 · Score: 1
  
  Oh, yeah. The original paper is here:
  
  http://www.docstoc.com/docs/529160/PageRank-for-Product-Image-Search
3. Re:image game data by Ihmhi · 2008-04-28 10:55 · Score: 1
  
  It should be noted that a lot of the prelim data for this was gained through human interaction that google setup as a game.
  1) Thanks for introducing me to that game and evaporating what little free time I had left today. d:
  
  2) It is interesting to see the responses for someone who does not care. The image? A Mercedes dashboard thermometer. The labels?
  Partner's guesses: jacquelyn is the coolest person, jacquelyn, kayla is cool, kayla, ass, butt, butt cheke, butt ox, mom, dad
  
  --
  Random Thoughts From A Diseased Mind (Not For Dummies)
4. Re:image game data by Ihmhi · 2008-04-28 11:04 · Score: 1
  
  Addendum: I have discovered that I can greatly amuse myself by offering commentary on the images themselves rather than actually try to label them, such as [Jessica Alba Picture] "I bet you are wanking now" and [trippy album cover] "I could do better if / I opened up MS Paint / and had a seizure."
  
  I might be the first person to be banned from this game... but as in the spirit of Watterson's Calvin I like to make someone's day a little more surreal. :3
  
  --
  Random Thoughts From A Diseased Mind (Not For Dummies)
5. Re:image game data by electrictroy · 2008-04-29 04:49 · Score: 1
  
  >>>"'most comprehensive image search on the Web.'"
  
  Sub-title:
  
  Making topless Miley Cyrus photos easier to find than ever before!
  
  --
  The government is not your daddy. Its purpose is not to raid middle-class neighbors' wallets and give it to you.
Excellent! by Tree131 · 2008-04-28 09:27 · Score: 3, Funny

Sweet!!! More exact pr()n searches!!! Wohooo!!!!
1. Re:Excellent! by Anonymous Coward · 2008-04-28 21:00 · Score: 0
  
  xnxx.com, dude
paper reference by rojathecabinboy · 2008-04-28 09:36 · Score: 2, Interesting

Does anyone have the full name/DOI of the paper?
1. Re:paper reference by Rui+Lopes · 2008-04-28 10:16 · Score: 3, Informative
  
  http://www2008.org/papers/fp506.html No DOI currently available, but pdf link is in the page.
  
  --
  var sig = function() { sig(); }
Wake me up when they include the porn by Anonymous Coward · 2008-04-28 09:39 · Score: 0

(which you know they didn't)
other uses? by papabob · 2008-04-28 09:40 · Score: 1

Talking about another uses... what about putting that techniques and the "enormous computing power" to some useful (for the society) jobs? It can be used to find mineral ores (maybe correlating aerial images with geological data?) or medical analisys (skin cancer? tissue identification?). It wouldn't give much direct economical revenue, but it will surely increases the Google "coolness" a lot (and from a shareholder point of view, it can be very very attractive)
definitely comprehensive by Anonymous Coward · 2008-04-28 09:41 · Score: 0

here's the result of cowboy neal: http://images.google.com/images?gbv=2&hl=en&safe=off&q=cowboy+neal&btnG=Search+Images
They'll need a new themed logo by Anonymous Coward · 2008-04-28 09:45 · Score: 0

G(.)(.)GLE? ...

"Filter error: Your comment looks too much like ascii art", you say? No kidding...
1. Re:They'll need a new themed logo by DeadDecoy · 2008-04-28 10:02 · Score: 1
  
  Hehe, so any time you find a picture of a hot girl you can say: I'd google that.
2. Re:They'll need a new themed logo by freemywrld · 2008-04-28 10:20 · Score: 0
  
  Wouldn't BOOBLE be more appropriate?
  
  --
  Support a true independent artist - Leila Lopez
3. Re:They'll need a new themed logo by RiotingPacifist · 2008-04-28 14:15 · Score: 1
  
  I wonder if BOOBLE will buy youprorn or porntube, perhaps they'll even start up knob, a wikipeida like site but for wankers who have nothing better to do
  
  --
  IranAir Flight 655 never forget!
4. Re:They'll need a new themed logo by fractoid · 2008-04-28 22:10 · Score: 1
  
  We need to start this meme now. A fair few of the images would be NSFW if they were higher resolution - any slashdotter that comes up against such label it idgooglethat, it shall be our secret code! ;)
  
  --
  Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Most comprehensive.... by Anonymous Coward · 2008-04-28 09:45 · Score: 0

Google does not disclose how many images it has cataloged, but it asserts that its Google Image Search is the 'most comprehensive image search on the Web. It's easy to assert anything when you fail to provide numbers.....
Low-hanging fruit by Anonymous Coward · 2008-04-28 09:48 · Score: 1, Interesting

Although image search has become popular on commercial search engines, results are usually generated today by using cues from the text that is associated with each image.

Which is a good point. Sometimes you don't want the text associated with the image, you want the image itself.
The canonical example would be image macros and comic strips. When you're looking for a particular LOLcat or demotivational poster, or even a specific comic strip based on a remembered punchline, the text in the image is what you want to be able to search for. The text associated with the images (that is, the HTML from whatever poster first showed it to you in some gaming forum thread or 'blog) is irrelevant.
It doesn't solve the broader problem, but it'd be a good starting point. Given that the fonts chosen for image macros and comic strips are designed for readability, standard OCR techniques would work. If machines can solve CAPTCHAs, Google can certainly index the text on images.
All of a sudden, every comic strip you ever remember reading as a kid (even if you've only got one or two pages of it) becomes searchable.
1. Re:Low-hanging fruit by boyter · 2008-04-28 11:33 · Score: 1
  
  I actually did this as part of my graduate thesis. I even managed to get a high percentage of (about 80% of words inside my sample of 50 images I could recognise) recognition of text inside images. While you are right and that standard OCR techniques work very well, the bigger problem is extracting the text so you can recognize it. I don't know of any technique that can extract this text that does not also massively scale the problem. I used multiple techniques to do this such as multivalued image decomposition and each turned the problem of finding text in one image to finding text in at least 10 images. A 10x increase and even that wont find much of the text in many images. I also had a large amount of false positives. I concluded that these problems really make pixel text indexing a headache. I guess you could match the pixel text against the text of the document to fix problems with false positives but that somewhat defeats the purpose of finding new content to begin with. Even if you do solve that you still have the scale of the problem, although thankfully it scales out to multiple machines very well. If anyone is interested in getting my thesis just leave a note on my website and I will link it, or email.
2. Re:Low-hanging fruit by boyter · 2008-04-28 12:49 · Score: 1
  
  Sorry about the dual post, I forgot to add I was looking at launching a website which indexes web comic strip text since as you point out it is very easy to extract and identify, but even then you need a targeted approach for each strip.
3. Re:Low-hanging fruit by Ihmhi · 2008-04-28 21:42 · Score: 1
  
  Oh No Robot [ohnorobot.com] has been doing this for years in the webcomics world - allowing users to assign text labels to comics. It's basically writing the script for a comic that already exists.
  
  --
  Random Thoughts From A Diseased Mind (Not For Dummies)
Image search technologies... by Kazrath · 2008-04-28 09:54 · Score: 1

Is this the same technology that google recently indicated will help "Crack down" on child porn? Or is this yet another different form of doing the search? And if it is different does anyone know if they have plans to put these two technologies+existing methods together to make the engine even more robust?

I don't expect an answer... but who knows maybe one of the goog guys that are in the know are reading.
Re:other uses? by Boa+Constrictor · 2008-04-28 09:56 · Score: 1

from a shareholder point of view, it can be very very attractive What's attractive about lower dividend and a less economic company? Investors invest to make money, and if they use their money for charity later that's great, but I doubt many would like investment and donation merged without their consents.
Lead Image Cataloger by AioKits · 2008-04-28 09:59 · Score: 1

Here's some background info on the guy: http://en.wikipedia.org/wiki/Mr._Magoo

--
"Quote me as saying I was mis-quoted." -Groucho Marx
Re:google is fucking retarted by Gat0r30y · 2008-04-28 10:23 · Score: 1

Let me guess, you had a chance to get in on the IPO when it was 40 bucks a share? And you turned it down, and said "40 dollars a share is way overvalued for a search company".

--
Prediction: The real iPhone killer is going to be sex robots from Japan. Think about it.
Product reviews? by tomaasz · 2008-04-28 10:40 · Score: 1

Image search this and that, sure, but why the hell is it still next to impossible to find product reviews using Google? Every time I try I only get product pages in online shops and not a single "real" review.
1. Re:Product reviews? by tooyoung · 2008-04-28 13:59 · Score: 1
  
  Image search this and that, sure, but why the hell is it still next to impossible to find product reviews using Google? Every time I try I only get product pages in online shops and not a single "real" review.
  Maybe you aren't good at writing your searches...
2. Re:Product reviews? by RiotingPacifist · 2008-04-28 14:33 · Score: 1
  
  I have to agree, there are much more important things that google could improve in its product search, would it be that hard to remove accesories unless the user is clearly looking for one.
  
  e.g "Mp3 player" sorted by price doest show anything but deliberately mis tagged headphones and ipod cases.
  
  --
  IranAir Flight 655 never forget!
3. Re:Product reviews? by Whiteox · 2008-04-29 00:35 · Score: 1
  
  Must agree with you. But much of the problem are empty 'user reviews' that these shopbot pages encode, so Google thinks that it's worthy of inclusion.
  Frankly, there are a few other annoying bugs. Hopefully they'll be fixed one day. Annoyingly, a lot of other search engines are 'google powered' and have the same faults.
  
  --
  Don't be apathetic. Procrastinate!
One Method by Toonol · 2008-04-28 10:51 · Score: 1

Maybe Google does something like this already, but I was thinking...

Can't they tune their image search by matching what results for particular terms are clicked? Presumably, the images people click on are more apt to be accurately described by the search terms originally entered, so that's like a constant 'free' image classification going on constantly.

For instance, if I put in "green field", I might get a bunch of images, and click on one that shows a grassy prairie. That image could be tagged with the keywords 'green' and 'field', probably weighted so that it takes multiple taggings to influence search results.
How it works by noidentity · 2008-04-28 11:10 · Score: 1

The company said that in its research it had concentrated on the 2000 most popular product queries on Google's product search, words such as iPod, Xbox and Zune.

iPod: look for lots of shiny white
Zune: look for lots of brown
Xbox 360: look for red dots in a ring
findimagedupes image similarity algorithm by Danny+Rathjens · 2008-04-28 11:46 · Score: 4, Interesting
I noticed this nifty little program in debian called findimagedupes. The algorithm for fingerprinting the files for comparing similarity is neat. From the man page:
To calculate an image fingerprint:
1. 1) Read image.
2. 2) Resample to 160x160 to standardize size.
3. 3) Grayscale by reducing saturation.
4. 4) Blur a lot to get rid of noise.
5. 5) Normalize to spread out intensity as much as possible.
6. 6) Equalize to make image as contrasty as possible.
7. 7) Resample again down to 16x16.
8. 8) Reduce to 1bpp.
9. 9) The fingerprint is this raw image data.
To compare two images for similarity:
1. 1) Take fingerprint pairs and xor them.
2. 2) Compute the percentage of 1 bits in the result.
3. 3) If percentage exceeds threshold, declare files to be similar.
1. Re:findimagedupes image similarity algorithm by momerath2003 · 2008-04-28 12:45 · Score: 1
  
  Great, so any two images have a 1 in 256 chance of matching exactly, and an even higher chance of exceeding the threshold. I like those odds.
  
  --
  I had but a simple dream, to destroy all humans.
2. Re:findimagedupes image similarity algorithm by Danny+Rathjens · 2008-04-28 13:12 · Score: 1
  
  1 false positive out of 256 is another way of saying 99.6% accurate. ;) Although in practice it works out to about 98% accurate according to the author; and my own tests don't dispute that. Also bear in mind that this tool is for looking through your own files for dupes, not comparing all images on the internet. :) There are obvious ways to expand the algorithm for larger datasets - and use of more processing power.
3. Re:findimagedupes image similarity algorithm by Anonymous Coward · 2008-04-28 13:32 · Score: 0
  
  If you're referring to step 8, I think "1bpp" means "one bit per pixel". So it's a 16x16 black and white image. Given the earlier normalization steps, chances of a match are reasonably low for most types of images.
4. Re:findimagedupes image similarity algorithm by sootman · 2008-04-28 14:18 · Score: 1
  
  Clever. I wonder if it could be adapted to find duplicate text files? :-)
  
  --
  Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
5. Re:findimagedupes image similarity algorithm by RiotingPacifist · 2008-04-28 14:26 · Score: 1
  
  They have a (1/2)^256 of matching
  so 1 in 115792089237316195423570985008687907853269984665640564039457584007913129639936 that's 8.6e-78
  ofc the initial steps will make this number smaller but still much bigger than 256.
  
  --
  IranAir Flight 655 never forget!
6. Re:findimagedupes image similarity algorithm by geonik · 2008-04-28 18:14 · Score: 0
  
  The above algorithm does not take into account the case of two similar images with one having an offset relative to the other. For example if the offset of the second image is greater than 1/16 the width/height of the picture, then all bits are wrong.
7. Re:findimagedupes image similarity algorithm by fractoid · 2008-04-28 22:28 · Score: 1
  
  And furthermore, if those images DO match then even if they're not identical, they are probably fairly similar.
  
  The utility mentioned sounds like it could do well reducing the images to 2-3 bits per pixel at 16x16 rather than 1, and storing 768 bits rather than 256 sounds less-than-overwhelming.
  
  --
  Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
8. Re:findimagedupes image similarity algorithm by Anonymous Coward · 2008-04-28 22:55 · Score: 0
  
  So if one image is a bit cropped (say the lower 1/16 pixel rows are cut off), the similarity that this algorithm finds will likely be 0.
9. Re:findimagedupes image similarity algorithm by njh · 2008-04-29 05:05 · Score: 1
  
  No, only the edges.
10. Re:findimagedupes image similarity algorithm by geonik · 2008-04-29 05:48 · Score: 0
  
  And what about scaling? I am not a DSP expert, but I was thinking that an algorithm working in frequency domain would be more accurate in comparing pictures...
11. Re:findimagedupes image similarity algorithm by njh · 2008-04-29 12:58 · Score: 1
  
  Sure, there are plenty of fingerprinting algorithms out there. A common approach uses multiple scales of gaussian blurring and downsampling, such as SIFT and SURF (essentially a different frequncy like space), or affine invariant frequency transforms such as radon or hough. The approach used in one FOSS program (gqview?) is based on wavelet compression.
Idea for Google by Cantus · 2008-04-28 13:51 · Score: 2, Interesting

Here's an idea for Google that's been on my mind for several months. Yes, I'm giving it out for free.

Let me upload an image in my hard drive to Google and have them check it against the zillion images on their catalog. Then give me a page with all the similar copies it could find, with a thumbnail and the URL from where it originates.

One practical use I can think of: Someone you meet on the web sends you a photo claiming to be of him/herself. With this Google utility, you could upload that same image and have Google tell you if it exists anywhere on the web. Then you'd know if this person just took it off a MySpace profile, etc.

Another practical use: Look for prior art or copyright violations on images someone claim is original work. Could be very useful for Wikipedia.

The potential for something like this is massive.
1. Re:Idea for Google by ColdWetDog · 2008-04-28 15:19 · Score: 1
  
  You just want Google to search for more porn for your collection. Lazy wanker...
  
  --
  Faster! Faster! Faster would be better!
2. Re:Idea for Google by Toandeaf · 2008-04-28 17:44 · Score: 1
  
  And there is an issue with this?
3. Re:Idea for Google by boyter · 2008-04-28 18:38 · Score: 1
  
  I seriously doubt google or anyone else has enough computing power to pull this off in real time. My guess is it would take several hours at least to run through any amount of images to make it worthwhile. It would be useful, but the scale of the problem is too large to be practical.
4. Re:Idea for Google by phreakhead · 2008-04-29 05:47 · Score: 1
  
  This would be really easy too just by taking an MD5 hash of the image file, then you could search for duplicates of images anywhere on the web. In fact it would be awesome to have this capability for ANY file; just type in your MD5 hash and get a list of links to different places hosting the file. Great for finding lost MP3s, remembering the source of where you downloaded that image, etc...
5. Re:Idea for Google by dargaud · 2008-04-29 05:48 · Score: 1
  
  I wrote google with that exact same request something like a week after they unveiled 'image search' in, what, 2000 ? Fat good it did.
  
  --
  Non-Linux Penguins ?
Re:google is fucking retarted by KGIII · 2008-04-28 14:02 · Score: 1

Err... No, no... The AC wasn't but the situation you described was me. I could have even afforded quite a chunk. I'm not entirely certain of the percentages but I'm pretty sure I invested in beer far more than I should have. Yes, yes I kick myself.

--
"So long and thanks for all the fish."
Re:other uses? by RiotingPacifist · 2008-04-28 14:38 · Score: 1

GSOC?

I think that the problem with what the GP suggests are 2 fold
1) I imagine, analysing aerial images is much harder than your typical photo
2) Medical analysis would require access to a lot of data, and people already have enough googlefoil hats

--
IranAir Flight 655 never forget!
GOOGLE CLAIMS IT SO IT MUST BE SO! by Jane+Q.+Public · 2008-04-28 15:15 · Score: 1

Hail Google! Self-proclaimed King of Everything! Go ye forth and Do No Evil (except in Russia and China -- oh, yeah, and that bit about caving to the Feds on your users' personal data)!
O RLY? by Alex+Belits · 2008-04-28 15:56 · Score: 1

(NO WAI)

--
Contrary to the popular belief, there indeed is no God.
Unless your photos are on Flickr.... by filmotheklown · 2008-04-28 18:36 · Score: 1

Is it just me or has anybody else noticed that Google doesn't make much effort to catalog the photos on Flickr, which is incidentally owned by Yahoo.

Or is it that Yahoo is blocking Google?????

Needless to say, if you search for a restricted set in Yahoo image search, you will pull up all of the Flickr photos. The same search in Google will often yield nothing from Flickr.

--
Filmo The Klown
"Image similarity" algorithms by ErkDemon · 2008-04-29 01:41 · Score: 1

Cool! I didn't know anyone had already done this.
I had basically the same idea, but I was going to keep the colour information, blur, include a global colour/contrast value (obtained by resampling to "1x1"), use that to colour-correct the image, and then resample to maybe 5x5.
I figured that for web searches, that should probably be good enough to find lots of alternative images from the same photoshoot or photoset as a sample picture, pictures taken by other photographers of the same scene, or still images of the same scene from a movie.
If you relaxed the search dependency on the global colour value, you should be able to find differently-processed versions of the same image. I was also going to strip the edge pixels (to remove borders). The use of colour-correction would mean that you wouldn't be wasting code resolution on data that was the same in each cell, and you could identify different version of the same image with different colour "casts", or where operators had played about with the contrast.
It was going to need a bit of R&D and a decent library of sample pictures to work out the best tradeoff between recognisability, color resolution per cell, and final number of cells ("5x5"?, "7x7"?), and once you'd done //that//, there'd probably want to be further R&D to consider possible ways of optimising and future-proofing the "ID code" file structure.
For instance, you might want to put the global colour value first (for finding "exact-match" candidates), followed by the centre cell value, then the surrounding cells ordered by proximity to the centre, based on the assumption that edge information is likely to be less important (if it turned out that the four corner cells weren't too helpful, your search algorithm could ignore the last four cell values).
And then you have issues over whether the number of cells in the ID ought to be fixed or variable, and if its variable, whether it should be recursive. You might base the system on "5x5", but allow the possibility of appended secondary ID code data that subdivides each cell and uses lower-res relative colour offset values for the sub-cells, based on the parent cell's value. Or you might not. It'd be up to the algorithm whether it was going to clip, degrade or weight cell values depending on their ordering.
If you were going along the "recursive" route, then instead of starting with a cell grid that then subdivides, you might divide the grid into concentric zones based on the picture's centre, for instance with a 5x5 grid, you could individually code the relative colours of the three zones: the central cell, the average of all the edge cells, and the average of all the intermediate cells. That'd give three codes that'd describe some of the colour variation across the image without being too sensitive to whether a human subject had moved an arm or leg between shots.
With more processing power you could start moving away from a square grid, and use proper concentric circular zones that are then subdivided, or a pattern of "test blobs" for sampling that approximate subdivided concentric zones (using pre-made bitmask templates for speed), but to start with, a grid approach would probably be more straightforward and simpler to execute.
One of those interesting projects I never got around to following through on ... :(

--
Eric Baird
LOLcats, eh? by ccozan · 2008-04-29 02:34 · Score: 1

When you're looking for a particular LOLcat or demotivational poster, or even a specific comic strip based on a remembered punchline, the text in the image is what you want to be able to search for. Aim in ur image, gugling.