Slashdot Mirror


Millions of Pages Google Hijacked using ODP Feed

The Real Nick W writes "Threadwatch reports that millions of pages are being Google Hijacked using the 302 redirect exploit and the ODP's RDF dump. The problem has been around for a couple of years and is just recently starting to make major headlines. By using the Open Directory's data dump of around 4 million sites, and 302'ing each of those sites, the havoc being wreaked on the Google database could have catastrophic effects for both Google and the websites involved."

8 of 427 comments (clear)

  1. Easy to prosecute, hmmm? by r00t · · Score: 4, Interesting

    Google has the records, and probably the original
    site exists with behavior dependent on browser name
    being GoogleBot or not. The replacement site will
    generally have some way of making money, which can
    be tracked via financial transactions.

  2. 301 redirects by Anonymous Coward · · Score: 3, Interesting

    A few months ago, I rearranged my website. To make sure people could still find things, I put 301 redirects on all the old pages that I moved.

    I noticed in my logs that search engines have repeatedly requested the 301 pages, but often don't follow the links to the new pages. And when searched with google, the pages still show up with the old urls. Should I be using 302 redirects instead?

  3. Not a surprise by faust2097 · · Score: 4, Interesting

    For at least the last 18-24 months it's been increasingly difficult to find non-spam/redirect/affiliate program links for a search on any popular consumer product on Google. Maybe they have too much faith in their current PageRank and think it needs to be tweaked instead of overhauled. Maybe they think they have enough momentum and don't care. They certainly should have the talent and resources to do something about this and it's kind of sad that they haven't. I predict we'll see another whizzy side project in a few months instead.

    The thing is that all they have to do is keep it just good enough that people won't leave. Remember, AdWords is Google's product, everything else [gmail, orkut, etc] they've got is just a way to show you those ads. Google's success is entirely because they had clearly better search results than anyone else. If another company can clearly best them then Google may be in trouble.

  4. My site is affected by barcodez · · Score: 4, Interesting

    My site the humor archives has been affected by this. I can tell because if you do the following search you can see a bunch of sites that are/were 302ing to my domain. I'm pretty pissed off and I seriously hope Google act soon to rectify the matter.

    --

    ----
  5. Re:Ugh. This is so not true. by metamatic · · Score: 4, Interesting

    Frankly, I'd like to see Google start blocking content-free traffic-boosting sites from the page results entirely.

    Google has login accounts, so let logged-in users have a link saying "report spam site". Track who files the most reliable reports, and if a few of those people all agree that a site is spam, nuke its pagerank.

    See how OpenRatings does reliability calculations for more info. Or buy them :-)

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  6. Doesn't effect Yahoo by X · · Score: 4, Interesting

    I'm surprised nobody has mentioned that Yahoo has already closed the 302 hole.

    --
    sigs are a waste of space
  7. clsc.net seems to be down... by luap2000 · · Score: 4, Interesting

    here's my write-up on the problem from early February called Google and the Mysterious Case of the 1969 Pagejackers. the problem has been around for a long, long time.

    personally, i'm ready to give up google maps or something else (autolink?) if they would 'fix' this or at least be more transparent about what's going on. ;)

    btw, the word on the net is that the googleguy posting here isn't the real one. anybody have details on this?

    -kpaul

  8. Re:302 by Ryan+Stortz · · Score: 4, Interesting

    I think a resonable solution to this would be for Google to send a second spider to the site for every 302 Redirect they find, with a user-agent indicating its IE or any other browser. Then compare the data.

    Although, they could probably still figure out it's google by their IP, but it's a step in the right direction.

    --
    Bugs are just features that have been fixed.