Google Image Index Just Not Updated
We ran a story earlier today about the lack of Abu Ghraib photos in Google's image index. We now have a response from Google stating that the image index simply hasn't been updated recently, as well as a fairly convincing demonstration from a Slashdot reader: Rahga writes "I put together a page that counters the 'Google Censors Abu Ghraib Images' story. It is the tale of a Morgan Webb picture on images.google.com that's been driving a ton of traffic to my webserver 7 months after it was removed." The Abu Ghraib story broke in April 2004 (and officially became a non-story on November 2, 2004), so Google's index is indeed quite far behind.
Like I mentioned in this post, I can vouch for this.
For the longest time, the search for my name on Google images would bring up really old images and it would never update them. So, in order to test this, I just removed those images and used a redirect (this was about 3-4 months ago) -- Google still did not update the pictures.
However, my academic page at my school did show up pretty soon, although it was created just recently. What more, it even showed the image of my latest schedule, and not an earlier one as in the other case.
So I guess Google probably uses some kinda weird algorithm to determine which sites are likely to be dynamic, and which are not -- and update/not update them accordingly.
Besides, everytime there's been a problem/censorship (say, due to DMCA) -- Google has been nice enough to notify the users during the search. Not to mention the amount of scalability doing something like this would require of them (which makes even less sense if they were the ONLY ones asked to do so).
So all in all, just a false alarm, I suppose.
This just goes to show that /. groupthink isn't always on target, and Google isn't the all-spidering oracle we think it is either.
Google's image search is not to be confused with Google's news search. If you search for Lyndie England against the news search, one of the pictures in question comes up in a thumbnail next to the first set of results. Google had plently of coverage of the Abu Ghraib story on its news pages, and its web search also has plenty of coverage of the topic. If Google was intentionally censoring, you think they woulda tagged all their search engines in the process.
For Google to be 6-months or more behind on reindexing their image storage to me seems about right. The link rot on the image search is starting to get annoying, but we've seen worse from the likes of Alta Vista in the past. Webcrawling seems simple but it's a very bandwidth intense process, and that means it costs money. Image spidering is even more expensive because pictures take up a whole lot more bitspace than HTML docs.
So, move that Slashdot story from earlier today from the Censorship category to the Almighty Buck category. That's the real reason why the pictures weren't there.
Seriously why does this need a new story? What was wrong with the update posted to the previous article summary?
Because in journalism there's a tradition of printing retractions for mistakes made on page A1 on a future page A1 in order to give the takeback as much exposure as the mistake. Slashdot leveled a rather serious charge of censorship against Google that quickly was proven not to be true.
Furthermore, there's a new piece of news coming out of this mess: Google's being quite slow on the refresh of the image search database.
Anyone have any ideas why they would be updating their image index so infrequently? Could it be because of the size of the files they are dealing with?
Be better in bed. Wikiafterdark!
They have some bugs to work out. A search on "to be or not to be" typically produces from 2 to 3 error results in the first ten. That is, if you search on the phrase (including quotes) you get page results that do not contain the phrase.
Don't blame Durga. I voted for Centauri.
Why is this "Your Rights On-Line"???
Since when does google have to do anything other than what they wish?
Lame...
It is a fairly minimialist search engine that searches Google, Yahoo, Ask Jeeves, About, LookSmart, Overture and FindWhat. I tried it a few times and find it occasionally returns a few more useful results than Google, and doesn't have an annoying clutter of ads.
(I supposed if it did I wouldn't know, I have mozilla configured to block even flash ads, and my firewall is configured to route most known ad servers to 127.0.0.1)
My rights don't need management.
And here we were, expecting Google to deliver us the latest in free pr0n images and thumbnails, and it's been shafting us with old crap the entire time!
The sky is falling!
The sky is falling!
Oh wait.
Nevermind.
(and officially became a non-story on November 2, 2004)
maybe the mass media isn't covering the prision over there in the sandy beach, but it's not all quiet, and definately deserves attention of those not deployed over there.
americans are still dying every day in that prision (which is controled by the americans). american troops are deployed in and around that prision sometimes for months at a time with no productive mission other than to be deployed so a general or such can get another stripe on their shirt. this is what our tax dollars are being used for.
there's units that have their own cooks but can't use them due to contracts with another food supply "company". what are these cooks doing? not a damn thing. there's people who are budgeted for a years deployment, but have replacements aready there. what happens to these troops? they get re-deployed to another closer area. these aren't the full time troops either, these are the reservists who are being forced to sit on their arse in the desert.
by the way, there's policy in abu-grabib now that photos MUST have faces digitally distorted. meaning if a solder takes a photo of someone who's leg has been blown off, make sure there's no face in the picture. i'm not even sure if they're aloud to send photos out w/o permission these days.
sign up folks, it's in the name of democracy after all.
I am pretty happy with the outcome of this story. Good on google for answering the allegations. Even when they must reveal some disparaging facts about their image search by doing so.
A Multiplayer Strategy Game for Mac OS X, Windows, and Linux
/. is always so quick to jump on anything that screams vast right wing conspiracy... and this time they got egg on their face. GOOD.
If you do a google image search for "www.google.com", one of the first results you get is an image of Alyson Hannigan. That image resides on my server.
I havent the foggiest idea how that image got associated with the string "www.google.com", no why it would be ranked so high. I havent linked to that image directly in over a year, and only on a page that Google shouldnt be trowling for images anyhow.
BTW, a good 70% of the traffic to my server is people looking for that image.
The Abu Ghraib story broke in April 2004 (and officially became a non-story on November 2, 2004)
How did this become a non-story? Are you saying that the press will no longer keep running it since it no longer helps Kerry? Did Bush pardon the soldiers involved? Were the prisoners freed and given settlements? Maybe it's a non-story now for the media, but it is still a story for those involved and for everyone smeared by the broad brush.
Viv
Gmail invites for ip
Michael and the rest of the editors had to be dragged kicking and screaming into this lame and uncontrite retraction because it was so untrue.
SIG:Slashdot: indymedia for nerds.
Had I not happened to login to /. just now, I would have been left with a considerably worse impression of my favorite search engine than now because of the old story. The fact they even responded to slashdot demonstrates something to me. I used Altavista as my primary in the nineties since it came out, and only last year converted to Google. I still use many, but Google is my choice nowadays, and I'd hate to see them censoring. That would IMMEDIATELY cause me to switch search engines.
The fact that the article was wrong is just as big as a story as the original, if not MORE significant, since the mistake could have mislead thousands upon thousands of readers.
We are one consciousness experiencing itself subjectively. Back to you with the weather, Bob!
Its better than trying to hide their mistakes. No matter what a company does today, they're going to get crap for it. So, let's say they don't they blame it on some obscure thing, or the DMCA or something equally idiotic. Then, all our friends here on /. jump up and say "That's so stoopid! My buddy and I could do a better job with a beowulf cluster!" But, when the company is transparent, as we like them to be, then we rail on them again for not being as cool as we thought they were.
Since the editors seem to have momentarily forgotten:
The Abu Ghraib story broke in April 2004 (and officially became a non-story on November 2, 2004)
To simpletons in the American electorate, that might be true. But, if anything, Nov 2nd made the story much more relevant to about a billion muslims who view it as proof positive that the current US government may talk a good story, but where it counts, in real life, their actions are a whole lot different.
When information is power, privacy is freedom.
But a month or so ago google couldn't find those images. I wanted to use one as port of an argument here on slashdot. So I fired up altavista.com for the first time in a couple of years. Altavista.com had no trouble finding the images. My conclusion was that google had made a decision to deep-six the links to those images.
This just goes to show that /. groupthink isn't always on target,
Actually, just the opposite. An inaccurate story was posted, and it was torn apart by the comments. The hive-mind that is slashdot preformed quite well, IMHO.
how come all of the political stories lately on slashdot have been slanted towards favoring the left?
oh yeah i know this is slightly offtopic or whatever, so mod me down so I can't be heard, I don't care.
Does the name Pavlov ring a bell?
It originally started with Google, but I sent a message requesting they removed them, and I'll be damned if they didn't graciously comply! Now Google no longer had record of those images, but Yahoo must have taken a copy of their archives when those two severed ties, because I saw refernces from Yahoo for things like "bigass.jpg" and "passedout.jpg". Imagine my joy... I was getting 404's out the bigass.jpg, and Yahoo wouldn't listen to me to take me out of their image index... Now, after several more months (and several dirty tricks), I no longer am included in Yahoo's index.
Does it stop there? No. Someone, somewhere along the way got a copy of those image thumbs out to every two bit search engine wannabe. To this day I still field 404's for stuff that I know had only been searched and indexed by Google, but has since found it's way via 3rd party routes into corners of the web I cannot begin to fully comprehend. *sigh* It's like a gnat bussing around my head... It's not hurting anything, I guess... but it's still annoying.
These days, I put the content="NOARCHIVE" meta tag on every web page I serve. It's not that I don't want visitors. I could deny them with a robots.txt exclusion to that end. I just feel that search engines still lack the ability to capture the nuance of what it is I do... And these days, it has nothing to do with bigass.jpg or images of drunks passing out.
(Not that those aren't fun things...)
Precisely.
For example, Fox News routinely headlines its news shows with retractions of and corrections to stories where they have been inaccurate or just plain wrong.
This space available.
4,285,199,774 pages is what they say they index. That's a 32 bit number and one that has been pretty much UNCHANGED for an entire year. People don't seem to bother google about the 32 bit thing much at all either. .. but THINK for a moment!
If they are stalled at the 32 bit limit and a simple webpage contains just 1.01 images... then they are grinding up against a selection issue. No, its not just money, but as simple algebra shows there would have to be MASSIVE problems selecting which images to update.
Multiply this hypothetical problem with just the distractions created by 'hearding brilliant people' and the plausible distractions of 'satisfying goverment and datamining requests' and pretty soon the stack will be full, pushing the plausible todo list item of '64 bit indexing' down the stack.
Thus.. while, sure, you can simplify this all to 'bandwidth costs money', I put forth that such a simplification is shortsighted. System complexity does not increase in a linear fashion and given that google is 'old enough' to have its systems grow to stress out and magnify whatever shortcomings went into and on top of an originally simple model I bet that slow image search updates are merely a symptom of a much deeper, much simpler than 'money' design hitch which the system that is google.. e.g. the tech/ brains/ people-know-how is a a loss to properly address in a radical way as, with an 'image' to maintain it has become much harder for them to transcend the limits of the 'google-system' and to effectively address the root of the 32 bit problem.
so, yes.. google's number of indexed pages has publically been at the 32 bit limit for a good year.
yes, the hot air and geek dreams projected on the 'google system' have kept anyone from noticing and only now that its impacting the expectations of some folks are people noticing reality. impacted as they are by 'money' they of course project the problem to be solvable by 'money'.
Systemantics dictates that its an inability to maintain self-transcendence which has kept them to keep from having the 32 bit limit catching them with their pants down.
pesky dot 64 dot cl at spamgourmet dot com
But none of the results that google search are links to the pictures of her wearing this dress
I think they know if start playing back room politics people will very quickly move to another search engine. It's a rare thing today to see a big company doing the right thing, and Google is one of them.
"And a voice was screaming: 'Holy Jesus! What are these goddamn animals?'" - HST
Yeah. Maybe you slashdot editors should do a little investigating before you start posting uninvestigated speculation as news. *cough*FOXNEWS*cough*DRUDGEREPORT*cough*
Seriously. It took, what, all of a few hours for the truth to out? But no. You couldn't wait.
Pfft.
vk.
Search for "litigious bastards".
The top result is SCO. Do you REALLY think they would have that in text anywhere on their site?
liqbase
I'm not so sure I'm convinced. For one, if this story broke around March/April, how come other March/April news stories have already found their way back into the index? ( Such as this item from 'The Age', found with a search for 'John Howard', our PM ). Second, do you honestly think that all the PFC. England photos in the index during this earlier period were all hosted on various news-wires?
I dunno if Google has done anything dodgy here, but it's all bit weird to say the least. I might start using another image searcher that's a bit more up to date.
YLFIOne god, one market, one truth, one consumer.
Some will be quick to decry how slashdot is quick to jump to conclusions. They'll draw fairly pointed comparisons between slashdot and 'real' journalism.
As far as they've reasoned it, they're right. But that's only because they haven't reasoned it quite far enough.
This is exactly the process that happens in the major news media. A journalist spots something unusual, thinks there might be a story there. An investigative team looks into the evidence, tries to get feedback from the source(s), and either corroborates or refines the initial hypothesis.
The difference that we're seeing here is that the story is not landing in our lap, fully formed and packaged according to the publisher's wont. In the past, we never saw the messy part of any story, just the finished product.
I happen to like being able to see the 'messy part' . I like it a lot. In fact, it's why I come to slashdot. If I trusted Big Media to properly digest and format my news, I'd have no need to come here at all.
The truth about slashdot is that, amid all the noise, the silliness, the kvetching and moaning, there is a great deal of solid fact-checking going on. Assumptions do get challenged, news is removed from its 'frame' and picked at. Opinions get challenged or supported by a large number of qualified peers[*].
[*] And admittedly, a smaller but significant number of unqualified peers. 8^)
How many media companies have the same resources available to them? Not many. Most don't even hire fact-checkers any more. And believe it or not, slashdot fact-checkers really are better than none at all. 8^)
Crumb's Corollary: Never bring a knife to a bun fight.
A retraction for the attack on Google, but another attack on the Bush Administration? Abu Ghraib was bad, but the issue here is Google's perceived censorship of the images, not the event itself.
RTFL: Look at the *second* link returned by that search at Google. litigousbastards.com has a campaign to post links to SCO using that phrase. The phrase is in the referring links, not the target site.
The campaign appears to be working!
*Still* negative function...
Shamelessly karma whoring ....... here ya go,
:-)
To get this result, you need to image search for Morgan Webb Nude, and click on the link at the bottom containing omitted results.
You're welcome.
in journalism there's a tradition of
In journalism, there's also a tradition of doing your job. The editors could have, at the very least, wrote a fucking email to google to know their position. What kind of real journalist wouldn't at least try to get the other side vision of fact ?
Even paparazzi would do that to avoid lawsuits.
And, on top of that, as others have mentionned the editors can't even apologize.
Their accusations were very serious, they didn't even try to check anything, and they offer no apologies ? Why are such morons even paid ?
Gasp! There's more than one search engine out there besides Google. And you can't police them all. So, maybe, instead of searching all the time, use some of the other search engine brands like lycos or even the pre-google favorite, alta-vista, just to keep google honest.
This is my sig.
i can't seem to remember bush torturing anyone. did you see any photos of him torturing people? i sure didn't. hmmm.
When I search with Google images for the phrase, "Abu Ghraib" , I get exactly 127 images.
When I search for the same phrase using Yahoo's image search:
http://images.google.com/advanced_image_search?hl= en
I get 3,493 images.
The moral here? Stop thinking about Google as the be-all and end all.
There IS competition out there, SO USE IT!
Only if you use the competition will google have the motivation to update their database and be competitive in this area!
That google is providing an inferior product is only an indication that we are being lazy consumers.
Personally, I like Google's GUI layout better than Googles. This is why I'm rooting for Google to come up to speed.
While we're identifying Google problems in the image area, Google might also think about suppressing images that are 98% the same color. Some searches are overwhelmed with that kind of drek.
And speaking of overwhelming drek, how about EITHER doing pre-display background checks for broken links and suppressing them, OR just developing a "cached" option like we get for web text pages. Either approach would save time and aggravation for the user.
And if the company that prides itself on not being evil would care to throw us a bone, give the advance image screen the same ability given to the other other screens to display results at 10, 20, 50, or 100 per screen!
Update the base, suppress the monocolor trash, cash images or suppress the links fom the search results, make the advance search give count options. By god, they could be MUCH MUCH MUCH less sucky.
Meds kicking in.....
must sleep now....
Hillary tucking me in while in the Lincoln Bedroom in January, 2009. ......sleep......
Live Long and Prosper - Thanks Leonard. You are missed.
I've noticed that the google images search seems to catalog two distinct kinds of pics: the high-turnover images from high traffic sites (mostly news sites), and the deep spidering of essentially random images from all the other sites. Since news-type sites have a lot of "churn", google re-spiders them frequently and the images search database gets updated for those sites fairly regularly. All other sites are pretty much just "when the spider gets around to it". It's not surprising that the Abu Ghraib pics would "fall off" the images index when the news sites moved on to the next titilating scandal of the week, and the slow-ass "rest of the 'net" image spider has a half-year-plus lag time in updating old entries. So you can't find Abu Ghraib pics. You also can't find "Alexandra Kerry in her black dress at Cannes" pics. But you can find plenty of pics of Paula Radcliffe, the marathon runner, running with the Union Jack and wearing number 576, even though those pics are under a day old. Good luck finding those same pics of her in a week though!
If a job's not worth doing, it's not worth doing right.
"The Abu Ghraib story broke in April 2004 (and officially became a non-story on November 2, 2004)"
With White House counsel Alberto Gonzales--a figure central to the internal discussion of 'when is it not torture' at the White House--on a very short list of Supreme Court nominees, this issue may very well flare up again sooner rather than later.
Mmmmmm... Bold, yet refreshing!