Millions of Pages Google Hijacked using ODP Feed
The Real Nick W writes "Threadwatch reports that millions of pages are being Google Hijacked using the 302 redirect exploit and the ODP's RDF dump. The problem has been around for a couple of years and is just recently starting to make major headlines. By using the Open Directory's data dump of around 4 million sites, and 302'ing each of those sites, the havoc being wreaked on the Google database could have catastrophic effects for both Google and the websites involved."
Google has the records, and probably the original
site exists with behavior dependent on browser name
being GoogleBot or not. The replacement site will
generally have some way of making money, which can
be tracked via financial transactions.
For at least the last 18-24 months it's been increasingly difficult to find non-spam/redirect/affiliate program links for a search on any popular consumer product on Google. Maybe they have too much faith in their current PageRank and think it needs to be tweaked instead of overhauled. Maybe they think they have enough momentum and don't care. They certainly should have the talent and resources to do something about this and it's kind of sad that they haven't. I predict we'll see another whizzy side project in a few months instead.
The thing is that all they have to do is keep it just good enough that people won't leave. Remember, AdWords is Google's product, everything else [gmail, orkut, etc] they've got is just a way to show you those ads. Google's success is entirely because they had clearly better search results than anyone else. If another company can clearly best them then Google may be in trouble.
My site the humor archives has been affected by this. I can tell because if you do the following search you can see a bunch of sites that are/were 302ing to my domain. I'm pretty pissed off and I seriously hope Google act soon to rectify the matter.
----
Frankly, I'd like to see Google start blocking content-free traffic-boosting sites from the page results entirely.
:-)
Google has login accounts, so let logged-in users have a link saying "report spam site". Track who files the most reliable reports, and if a few of those people all agree that a site is spam, nuke its pagerank.
See how OpenRatings does reliability calculations for more info. Or buy them
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
I'm surprised nobody has mentioned that Yahoo has already closed the 302 hole.
sigs are a waste of space
here's my write-up on the problem from early February called Google and the Mysterious Case of the 1969 Pagejackers. the problem has been around for a long, long time.
;)
personally, i'm ready to give up google maps or something else (autolink?) if they would 'fix' this or at least be more transparent about what's going on.
btw, the word on the net is that the googleguy posting here isn't the real one. anybody have details on this?
-kpaul
J-Log: Journalism News, Media Views
I think a resonable solution to this would be for Google to send a second spider to the site for every 302 Redirect they find, with a user-agent indicating its IE or any other browser. Then compare the data.
Although, they could probably still figure out it's google by their IP, but it's a step in the right direction.
Bugs are just features that have been fixed.