Millions of Pages Google Hijacked using ODP Feed

OMG!!! by justforaday · 2005-03-23 03:37 · Score: 1, Funny

Nothing for you to see here. Please move along.

OMG!!! Slashdot's been hijacked too!

--
I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.

A Real Question by 2names · 2005-03-23 03:40 · Score: 1, Interesting

What gains are made when someone hijacks a web site? This has probably been discussed before, but I'm too lazy right now to look it up. Anyone?

--
"I'm just here to regulate funkiness."

Re:A Real Question by justforaday · 2005-03-23 03:41 · Score: 2, Informative

According to the previous article (posted a few days ago, and linked to in TFS), a page utilizing this redirect exploit essentially supplants the original page in Google's pagerank listings...

--
I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.

Ugh. This is so not true. by GoogleGuy · 2005-03-23 03:40 · Score: 2, Informative

This is a placeholder. I'll include more details of why you shouldn't listen to Threadwatch.org in a bit, and debunk this some. Let me get this posted and I'll follow up.

(Yes, I am GoogleGuy.)

Re:Ugh. This is so not true. by Solder+Fumes · 2005-03-23 03:43 · Score: 5, Funny

This is a placeholder rebuttal, I'll post why your arguments are COMPLETELY STUPID after you actually post them.
Re:Ugh. This is so not true. by ari_j · 2005-03-23 03:46 · Score: 5, Funny

This is a placeholder troll. I'll post why you are an idiot and why Google r0x0r5 after you post your rebuttal but before I read it, as well as before I read the argument you are rebutting or the article.
Re:Ugh. This is so not true. by Solder+Fumes · 2005-03-23 03:47 · Score: 0

In an upcoming post, I'll tell you how much of a wannabe you are. Oh it's going to be good, just you wait!
Re:Ugh. This is so not true. by Hinhule · 2005-03-23 03:47 · Score: 1, Funny

I think this classic is in order.
Re:Ugh. This is so not true. by yagu · 2005-03-23 03:49 · Score: 1, Troll

This is a placeholder.... I'll include more details of why you shouldn't believe the NEXT slashdot article.... Let me get this posted.... and I'll follow up! (Hey, if the other guy can get modded informative for that.... this, since it's for a future article ought to be insightful). And, no, I'm NOT a GoogleGuy.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 03:49 · Score: 2, Insightful

Wow, getting modded up just for leaving a message on our answering machine! I guess it's true, just like with Wil Wheaton, if you claim to be (or are) someone of alleged importance, you too can get +5 Informative on every post, no matter what you say (or don't)!
Re:Ugh. This is so not true. by terraformer · 2005-03-23 03:56 · Score: 0

Take your time. In fact, you can just go back here and link to your rebuttal to the original posting of this story.

--
Who are you? The new #2 Who is #1? You are #617565. I am not a number, I am a free man! Muhahaha.
Re:Ugh. This is so not true. by phuturephunk · 2005-03-23 03:56 · Score: 0, Offtopic

In an upcoming post, I'll laugh at all your small internet penises..or is it Penii..just to reaffirm my geek masculinity.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 03:59 · Score: 0

I guess it's true, just like with Wil Wheaton, if you claim to be (or are) someone of alleged importance, you too can get +5 Informative on every post, no matter what you say (or don't)!

That's so true. I claim to be "Anonymous Coward," and look how many +5's I get.
Re:Ugh. This is so not true. by GoogleGuy · 2005-03-23 04:14 · Score: 5, Informative

Okay, I'll talk about this whole "millions of webpages hijacked! Film at 11!" piece of scaremongering. If you RTFA, the author (and the submitter of the story?) claims that some scraper sites have pulled down a copy of the dmoz RDF, gotten the urls, and are doing 302 redirects to sites in an attempt to hijack them. Note that this does not mean that lots of pages were hijacked at all.

Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

A lot of sites that try to spam search engine indices get caught, and their PageRank goes lower and lower as their reputation suffers. We do a very good job of picking canonical urls for normal sites; sites with their PageRank going toward zero are more likely to have a different canonical url picked, though, and to a webmaster I understand that it can look like "hijacking" even though the base cause is usually your reputation declining. For a long time, it was hard to get anyone to report canonicalization problems, because the site that got "hijacked" would be free-cheap-texas-holdem-plus-viagra-and-payday-loa ns-as-well.com type sites. In fact, I had to offer to ignore the spamminess of any reported sites in order to get people to send in any real data.

But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report. Then I created a little mailing list with some engineers on it, and user support passes on emails that meet the criteria to the mailing list.

So how much reports has all this work (including posting multiple times on lots of webmaster boards to request data) gotten me? The last time I checked, it was under 30. Not a million pages. Not even a hundred reports. Under 30. Don't get me wrong, we're still looking at how we can do better: one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.

As a side note, I'm getting a little tired of debunking the source of this story (NickW at threadwatch). For example, he claimed that Google had removed Greg Duffy from Google's index. When I pointed out that he was making an assertion of fact without evidence, he started out revising the story by sprinkling in words like "appears" and eventually pulled the story at http://www.threadwatch.org/node/1822 off his front page. But given that this is the third link to NickW's site from Slashdot in the last couple weeks, I'm guessing that he's tasted the Slashdot effect and wants more.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 04:16 · Score: 0

Really? Hey I am a Microsoft developer and..
Oh wait...
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 04:30 · Score: 0

Mind divulging what your relationship with google is?
Re:Ugh. This is so not true. by ghoti · 2005-03-23 04:35 · Score: 1, Insightful

Why don't you just pick the new URL as the canonical one? This way, any hijacking attempts would have no effect. And if I really want to do a permanent redirect, I don't want the old URL to stay in Google's database, anyway. I guess transferring the PageRank would be tricky (would make it possible to hurt a page by redirecting from a very low-rated one), but this still seems to be a lot less open to abuse.

--
EagerEyes.org: Visualization and Visual Communication
Re:Ugh. This is so not true. by Dynamoo · 2005-03-23 04:38 · Score: 4, Insightful

You contact user support and use the keyword "canonicalpage" in your report.. So how much reports has all this work gotten me? The last time I checked, it was under 30
Well shucks GG, not every webmaster is glued to WMW and other forums.. and even if they did the signal/noise ratio on this topic is so low that you probably couldn't find the information even if you were looking. It's hardly an obvious reporting mechanism. Although posting it on /. should help some, so that's appreciated. Thanks.
But look - what we have here are a whole bunch of webmasters who have been nuked off the face of the earth by 302 redirects and just don't have the technical knowledge to try and fix it. Mom and Pop stores, hobbyists, nonprofits etc etc. These people are just gonna get pasted.. they'll just be wondering why they don't get any visitors any more.
This is a HUGELY serious problem - and it's getting worse all the time as more and more people deliberately try to exploit the 302 bug. I've been hit by this bug myself, and let me tell you that unless you know EXACTLY what to look for you'd be stuffed - all you'd see is your traffic flatlining.
The key issue here - and it's the kind of issue that will really, really hit the headlines when it's exploited is redirection. Sure, I can use a 302 and send Googlebot to the correct page.. so first of all I basically 0wn the content of that page not the publisher. *Then* I insert an exploit into the 302 redirect.. and hey presto, I've 0wned hundreds of thousands if not millions of computers. *That's* going to make unpleasant reading for Google when it hits the headlines - "Use Google and Get Owned". Nasty.

--
Never email donotemail@WeAreSpammers.com
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 04:48 · Score: 5, Informative

But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report.

I'm sorry, but this is a flat-out lie. If you are the GoogleGuy, then there were 1000+ post threads on WebmasterWorld where people were begging you for input, and you essentially disappeared. I think I might remember seeing one post from you about this "canonicalurl" on a short, almost unrelated thread. You certainly didn't make it clear where to send problem reports, at least not on any of the threads that people were actually reading.

The fact is, this is a huge problem, and has totally fucked a lot of legitimate site rankings. I honestly believe Google was doing everything in their power to ignore the problem up until now, hoping that it was just a figment of people's imagination, or worse, that it would help increase advertising revenue. And now that it's turning out to be a PR disaster for you, you're in damage control mode.

I run one of the sites that was affected by the 302 bug. I sent a message to Google about it, and got a canned response essentially telling me there was nothing wrong. I read through no less than 10 threads on WebmasterWorld about this, many with hundreds or even thousands of posts. I saw maybe, maybe, two or three from GoogleGuy. Where were you? Did you somehow miss those threads that spanned 80+ pages??? Why weren't you posting on those threads about this "canonicalurl" thing.

Luckily there was only one site 302-ing me, and they were doing it by accident and were happy to remove me from their directory. Now I'm back up at the top of the rankings. But I know it's going to be nowhere near as easy for many of the thousands of people who are still affected by this.

Seriously, that you would come on here and try to discredit someone for bringing attention to a very big problem with Google is pretty distasteful. To me it indicates either a cover-up or having your head buried firmly in the sand. Either way, it doesn't bode well for the future of Google. Instead of flaming people now that the problem is getting mainstream press, why not try and actually fix things.
Re:Ugh. This is so not true. by tobyvoss · 2005-03-23 05:01 · Score: 1

mod parent (placeholder troll) up! funniest thing ever read on /.
Re:Ugh. This is so not true. by dfjghsk · 2005-03-23 05:03 · Score: 1

I wish I had mod points for you. If this was MS, everyone here would be screaming bloody murder. Instead GoogleGuy gets moded +5 Informative
What happened to Google's 'dont be evil' policy. Guess that only applies when its convenient. Personally.. I would have given GoogleGuy -1 Troll.

--
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
Re:Ugh. This is so not true. by salvorHardin · 2005-03-23 05:11 · Score: 1, Insightful

What happened to Google's 'dont be evil' policy. Guess that only applies when its convenient. Personally.. I would have given GoogleGuy -1 Troll.
Google/GoogleGuy isn't being evil, just seemingly suffering from ignorance and/or apathy.
That said, I'm reminded of a quote I heard once: "The only thing necessary for the triumph of evil is for good men to do nothing". Please stop doing nothing, Google.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 05:14 · Score: 0

This isn't evil, merely clueless. Not the behavior, Google knows search, anyone could be suckerpunched by a loophole like the 302 exploit, and Google got hit. The response to it, however, has been less than stellar. "canonicalpage" is a solution? Jeepers.
Re:Ugh. This is so not true. by xconfig · 2005-03-23 05:16 · Score: 1

That's right, the hallowed tradition of mod'ing up comments you agree with, no matter how inane they are, and of mod'ing those you dislike down. How is GoogleGuy's post a troll exactly?
Re:Ugh. This is so not true. by NanoGator · 2005-03-23 05:16 · Score: 3, Funny

This is a placeholder karma whore. I'll post about how this is really part of Microsoft's grande evil plan. Best part is, I'll get a higher score than any of you!

--
"Derp de derp."
Re:Ugh. This is so not true. by metamatic · 2005-03-23 05:21 · Score: 4, Interesting

Frankly, I'd like to see Google start blocking content-free traffic-boosting sites from the page results entirely.

Google has login accounts, so let logged-in users have a link saying "report spam site". Track who files the most reliable reports, and if a few of those people all agree that a site is spam, nuke its pagerank.

See how OpenRatings does reliability calculations for more info. Or buy them :-)

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Ugh. This is so not true. by salvorHardin · 2005-03-23 05:27 · Score: 1

Agreed. It may not be what we want to hear, but it is at least +n, Informative
Google's loss is the Yahoo PR department's gain.
Re:Ugh. This is so not true. by 1u3hr · 2005-03-23 05:27 · Score: 2, Insightful

I wish I had mod points for you. If this was MS, everyone here would be screaming bloody murder. Instead GoogleGuy gets moded +5 Informative
It's EXTREMELY informative, because it tells you what Google's offical position is. Whether you like it or not, you need to know that. "Informative" doesn't mean "good".
If Bill Gates posted here in defence of some MS policy, it would hopefully similarly be modded "informative".
Re:Ugh. This is so not true. by bigbloggingbuggar · 2005-03-23 05:32 · Score: 1

Sorry Googleguy, I am new here, can you tell me what RTFA means please?
Re:Ugh. This is so not true. by cloudmaster · 2005-03-23 05:39 · Score: 1

So, for every page with a 302 redirect, send a second response with a random user agent (a la curent spam's bayesian filter crashing) and see if the resulting page is similar. Or only apply the "is this page identical to another page, if so, only remember one URL" test when the domain of the pages match.

You're welcome. I accept paypal. :)
Re:Ugh. This is so not true. by metamatic · 2005-03-23 05:46 · Score: 1

Because the new URL is not the canonical one; that would be a 301 redirect. For a 302, the old URL is the canonical one, and the content is merely temporarily at the new URL.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Ugh. This is so not true. by cloudmaster · 2005-03-23 05:46 · Score: 0, Offtopic

I meant "send a second request" rather than "response". Doh.
Re:Ugh. This is so not true. by dfjghsk · 2005-03-23 05:47 · Score: 1

tell me how he's isnt a troll.. he comes on here and tells us the 302 exploit isn't a problem because he put a link on some page no one can easily find and he only got 30 responses.

--
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
Re:Ugh. This is so not true. by tfountain · 2005-03-23 05:47 · Score: 1
Okay, so say:
- URL A is an outclick redirect script from a search engine type site. If you click on the link it redirects with a 302 to URL B
- URL B is a normal page with content on
from your explanation above, Google sees the two URLs as having the same content and therefore has to pick one to list (for a given search). Pagerank is a contributing factor here, so if URL A has a higher page rank, URL A would appear *instead* of URL B in the listings? Would it not make sense to take into account the fact that URL A is a redirect to URL B? And just list URL B? If the above is true then URL A could remove a competitor from the listings (albeit temporarily) by replacing their entry. Then remove the redirect and just have their own stuff there instead. A take your point that there is a lot of scare mongering going on here and that this isn't as big a problem as some people are making out. But it does still mean sites can 'fool' the search engines into listing someone else's content on their own domain?
Re:Ugh. This is so not true. by dfjghsk · 2005-03-23 05:52 · Score: 1

ah.. but googleguy says he doesn't represent the official opinion of Google, Inc. And if Billy came on here and said the security of Windows isn't a problem because he put a link to report problems on MSN on some obscure page that no one can easily find... i have no doubt he would be modded a troll.

--
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
Re:Ugh. This is so not true. by _xeno_ · 2005-03-23 05:53 · Score: 1

Why not just weight down cross-domain 302s? I can't think of any time when a cross-domain 302 would ever really be valid. In fact, most of the time when I see a cross-domain 302, it's from a "click-through" tracker script that wants to count how many people click through to a given site. (Like, say, the video games the Slashdot editors are playing in the right column...)

It seems that this entire problem could be solved simply by either disallowing cross-domain 302s or heavily weighting them down. Maybe even do something a little "smarter" and disallowing domains outside the /24 IP block.

I'm sure, somewhere, there's a valid use of a cross-domain 302, but for the most part, I can't really think of any. And the ones I can (temporary mirror on another site) should be compensated by the fact that the link is supposed to be temporary. If you control both sites, you can just bounce people with a 301 once the temporary mirror is no longer needed.

--
You are in a maze of twisty little relative jumps, all alike.
Re:Ugh. This is so not true. by HogynCymraeg · 2005-03-23 06:10 · Score: 0, Offtopic

This is the obligatory "All your pages are belong to us" quote.
Re:Ugh. This is so not true. by pastepotpete · 2005-03-23 06:24 · Score: 1

There's an awful lot of cross-domain 302's out there from very important authority sites. I believe that if Google did this, that they would see a huge upheaval in the pagerank distribution across the whole of their SERPs. For example, it is no secret at all how meaningful Yahoo Directory listings are to the *average* website in terms of getting ranked on their primary keywords. Every link in the Yahoo! Directory uses a 302 redirect. If google suddenly threw out those links from their PageRank accounting, they would probably shuffle up their index a great deal as a result. Right now they have results that make them a good deal of money and it would be ill-advised, from a shareholder's perspective to just fix this problem and see what happens. I've said all along that google knows about the problem but that there are larger issues preventing it's repair that it can't disclose.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 06:32 · Score: 0

I hope that you can explain that to Matt Drudge when he finds out that there are entries in his site that don't have one of his urls attached.
Love it
Do a find in page for:
Say www.charlesmentor.com
What the heck would I know?
www.charlesmenter.com/modules/links/redir.asp?link =3 - 28k - Supplemental Result - Cached - Similar pages
Explain why this is there and if it trips the dup content filter?
Then explain why parked domains cause massive content dupicaltion if the site uses relative hrefs.
I also hope you have an explaination when the exploit trackers that issue CAN numbers find out about script injection.
How does Google look at Polly Pure Heart's Site when it starts linking to the SodomyRUs site because the url was associated with her site?
Re:Ugh. This is so not true. by S3D · 2005-03-23 06:32 · Score: 1

Google has login accounts, so let logged-in users have a link saying "report spam site". Track who files the most reliable reports, and if a few of those people all agree that a site is spam, nuke its pagerank. This method will not work. First thing microsoft.com will be nuked. Google probably wouldn't want it.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 06:38 · Score: 0

This placeholder reaffirms the needs to chlorinate the gene pool, as I welcome our page hijacking overlords pages are belong to us while in South Korea only old people page hijack pictures of Natalie Portman covered in hot grits while page hijacking. All anonymously.
Re:Ugh. This is so not true. by xconfig · 2005-03-23 06:42 · Score: 1

Sigh, at the risk of responding to a troll..
Here's the definition of troll. The relevant one in this case is this:
"Posting derogatory messages about sensitive subjects on newsgroups and chat rooms to bait users into responding."
Notice how the definition says nothing about whether the message is 'right' or 'wrong'. GoogleGuy was definitely not trying to bait others, and his post had a lot about the inner workings of google that I didn't know about. Perhaps you already knew all that (though I kinda doubt it), perhaps 'informative' is subjective. But 'troll' is not.
As pointed out earlier in this post, when Linus says something about Linux, or Bill Gates about MS, or GoogleGuy or Sergei Brin about Google, it is rational to consider it more informative than if just anybody said it. Even if it contains speculation about the future it is more likely to be fact in future, just because they're saying it. Even if you disagree with them, it behooves you to listen to them because they likely have greater leverage than you do. And mod'ing what they say -1 does everyone a disservice because then nobody will read it, and others who agree with you will not be able to 'know the enemy'.
I know all this is off-topic. But one spreads the word..
Re:Ugh. This is so not true. by shlashdot · 2005-03-23 06:42 · Score: 1

Yes we all know a $300 check guarantees a quality site.

--
Additional plugins are required to display all the media on this page.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 06:57 · Score: 0

For a 302, the old URL is the canonical one, and the content is merely temporarily at the new URL.

No, it isn't. While the content is at the new URL, the new URL is the canonical one (because it's telling you to go there.)

Once the redirect has ceased, the old URL would be the canonical one again.
Re:Ugh. This is so not true. by oni · 2005-03-23 07:16 · Score: 1

I like the idea of weighting them down.

Another idea is to treat them internally (and for the purposes of pagerank weights) as regular links. In other words, forget the fact that 302's are supposed to mean "temporary location" and just think of them as links. That's the way they're being used anyway.

Right now, google is treating 302's the way that the HTTP rfc author's intended them to be treated. Unfortunately, in the real world 302's aren't used this way. Google just needs to adapt itself to the real world. They already do this with other things, for example meta tags. MEta tags are a great idea, put a few keywords in a meta tag so that a search engine knows what the page means, rather than just what it says - but so many people use them to intentionally try and screw up search engines, that google has to ignore them. This is the same sort of issue.
Re:Ugh. This is so not true. by GoogleGuy · 2005-03-23 07:52 · Score: 4, Informative

One example is http://www.doi.org/, because people want to have a persistent url like dx.doi.org/10.1226/1588290972, but then be able to have that url do a 302 redirect to a destination page like http://doi.contentdirections.com/mr/humana.jsp?doi =10.1226/1588290972 The destination urls might change, so it's handy to have a persistent digital object identifier on doi.org.
Re:Ugh. This is so not true. by GoogleGuy · 2005-03-23 08:00 · Score: 1, Offtopic

Hi bigbloggingbuggar, I believe RTFA is a related term to RTFM. I prefer to think of it as "read the fine article". ;) I try to be polite at most of the places that I post (WebmasterWorld, Danny Sullivan's Search Engine Watch forums, etc.), but RTFA is a well-established piece of lingo at Slashdot, used to encourage people to review the basic facts presented in the submitted story. As part of blending in, also look for me to slip in references to Cowboy Neal, Soviet Russia, Natalie Portman, grits, etc. etc. I've been reading Slashdot for years, but I've only had the GoogleGuy handle on Slashdot since Jan 19th.
Re:Ugh. This is so not true. by Crobb305 · 2005-03-23 08:29 · Score: 1

Googleguy, You are familiar with the report I sent you in early February to your Google Groups location. I am concerned that my efforts to remove the 302s/tracker2s have caused the evidence to be deleted so-to-speak. As I stated before, when I searched site:mysite.com, numerous 302s and tracker2s that were not mine were listed as though they were part of my site. Over the past 6 months, I have managed to get those removed using the removal tool and by contacting webmasters. Obviously, my intentions were good, but do you think those deletions posed a problem to engineers? Did I delete the evidence? Still, when I search site:mysite.com, several urls that are not mine show. But at least I am down to 3 from about 20. Incidentally, my pagerank never decreased as you say in your post and those redirects still outranked me as if they were canonical.
Re:Ugh. This is so not true. by metamatic · 2005-03-23 09:19 · Score: 1

The majority of people hate Microsoft that much? I don't think so. Sure, I hate them, and you probably hate them, but the average Google user?

Plus, so what anyway? It's not like anyone using Microsoft products has such a shortage of ways to find Microsoft's web site that they have to use Google.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 09:43 · Score: 1, Interesting

The only problem GoogleGuy is that you folks are the ones creating the duplicate content you can store the url back on the site doing the 302 as a url only entity without content and not store the url and the content from the redirected to site on the redirected to site but just the redirected to site's content for that page.

You are currently hurting a totally innocent party.

Googlebot is not a browser.

The purpose of the redirects was to deliver content to a person.

I ask site1 please get me this.

It gets it or tells my browser where it is.

If my browser is told here is where it is then my browser says site2 please get me this .... and so it continues until a limit is reached, an error occurs, or it shows up on my screen.

In all cases the page is on the referred to site and that is where it belongs under its name on that site.

You can still keep a place holder on the original referring site in your database but not in the index so the next time you spider that link you'll have the proper starting point.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 09:43 · Score: 0

Why not allow the destination page to assign a status to itself?

Create a new header tag on your page that tells robots that 'This is the permanent location of this page. Assign all PR to this page and do not assign the content of this page to any other pages.' OR 'This is a temporary location for this page. Assign content and PR to the permanent page located at http:...'

Right now the referring page is trusted to assign the temporary or permanent status through the type of redirect. Having a page assign it's own status would give webmasters the final word on the page status and location... essentially it's a checks and balances system for redirects.
Re:Ugh. This is so not true. by swimmar132 · 2005-03-23 09:45 · Score: 1

Do you have any idea what "troll" means?

Rhetorical question, I admit.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 10:07 · Score: 0

It's not like the moderators know either.

Posting AC because I'm a big C.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 10:32 · Score: 0

That's the biggest pile of nonsensical double speak I've ever seen spew forth from Google Guy.

I was skeptical until I actually saw a 302 hijacked site and it's obvious that Google chose the source of the 302 redirect as the authority and owner of the page content and not the actual domain that the page resides upon.

You have a SERIOUS BUG so get your lazy millionaire IPO'd asses busy and FIX IT!
Re:Ugh. This is so not true. by Morlark · 2005-03-23 10:39 · Score: 1

I believe there may be some confusion here. By 'new URL' you are referring to the URL on which the content actual is, i.e. the canonical one. But in these hijacking cases this 'new URL' actually predates the 'old' one, so the term 'new URL' is rather ambiguous.

--
Santa's suicide mission go!
Re:Ugh. This is so not true. by glesga_kiss · 2005-03-23 12:15 · Score: 3, Insightful

Google has login accounts, so let logged-in users have a link saying "report spam site".
As an alternative, I'd love a cookie based version of this that you could click "ignore all results from this domain". After a couple of weeks you'd get rid of most of them on your personal browser. Make the lists sharable even. All the pagerank wannabies can do is start from scratch with new URLs.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 14:40 · Score: 0

What GoogleGuy says about PageRank is not fully correct PageRank can be easily "manufactured" using Dmoz dumps see eg. www.freeaddurl.org/freeaddurl/ and 1000's of pages of junk publication eg. www.pavan.org/Candidate/Steve.htm and www.emsindia.com/Two/2-46.htm

Lot of high PageRank sites are nothing but large pyramid cross linked junk publication sites which has no value to any net user but to boost the PageRank of a set of sites. Many of these high PageRank sites then sell links to other sites.

PageRank has meaning only when honest website owners link to other good websites. The moment pyramid junk page publication and Link Trading has started PageRank's relevance has vastly decreased.
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-23 23:12 · Score: 0

When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation.

You actually don't have to choose anything.
If I say that I SUBMIT myself to another place, it means that all my (positive) credibillity should be passed to that place and not the other way around.

The logic of some of you G guys regarding understanding RFC's are sometimes unnecessary weird.
KISS.
Re:Ugh. This is so not true. by Debstips · 2005-03-24 01:07 · Score: 1

I guess the biggest question I have is why are temporary redirects showing in Google at all?
Re:Ugh. This is so not true. by Anonymous Coward · 2005-03-25 04:44 · Score: 0

For something that does not exist there is a VERY LARGE amount of discussion about it:

See the long list at: http://www.webmasterworld.com/forum30/28742.htm for starters.

Robot.txt by superpulpsicle · 2005-03-23 03:40 · Score: 2, Insightful

I am really extremely entirely confused about the article altogether. Is the hijacking more or less about Google digging into your site even when your robot.txt crawler robot is refusing google entrance?

Re:Robot.txt by wizbit · 2005-03-23 03:44 · Score: 5, Informative

No, it means Google has indexed a page that appears (to googlebot) to contain something legitimate, and visiting the actual page by clicking the link silently redirects you to an illegitimate site (usually phish/scam copy of same, etc).
Re:Robot.txt by pluggo · 2005-03-23 03:46 · Score: 5, Informative

There was an article a little while back on /. that talked about this exploit.

Site A can return a 302 HTTP redirect to site B when Googlebot crawls their site. The googlebot will then index site B as site A. Site A could have no affiliation whatsoever with Site B; people could be clicking on SesameStreet.com and get AsianHookers.com, etc.

I do think the figure of millions of pages being hijacked is a little steep, though.

--
Pulling together is the aim of despotism and tyranny. Free men pull in all kinds of directions. It's the only way to mak
Re:Robot.txt by PornMaster · 2005-03-23 03:49 · Score: 4, Insightful

I do think the figure of millions of pages being hijacked is a little steep, though.

Why? It can be completely automated. A million is no harder than four.

--
500GB of disk, 5TB of transfer, $5.95/mo
Re:Robot.txt by fafaforza · 2005-03-23 04:07 · Score: 0

So if some phisher has access to put a redirect on sesamestreet.com, he could simply upload the content of asianhookers.com. I do not see how this is a great threat, as a 419'er would need access to paypal.com to fool anyone into going to his server in Nigeria.
Re:Robot.txt by Anonymous Coward · 2005-03-23 04:08 · Score: 0

people could be clicking on SesameStreet.com and get AsianHookers.com

Are you sure it's not just a typo?
Re:Robot.txt by mopslik · 2005-03-23 04:11 · Score: 1

So if some phisher has access to put a redirect on sesamestreet.com, he could simply upload the content of asianhookers.com

My understanding is that it doesn't work this way at all. I believe what happens is that the hijacker sets up a page/site that redirects to your own site. Google then crawls the link, and erroneously indexes the content from your page with the URL of the redirecting page. From there, it's trivial to change the redirect on the fake page to someplace else, and maintain the appearance of containing your original content.

I might be slightly off in this simplification, though. Someone correct me if necessary.
Re:Robot.txt by AssHatAnonymous · 2005-03-23 04:13 · Score: 3, Informative

No, it's the other way around. Someone has access to asianhookers.com and redirects to sesamestreet.com. When googlebot then correlates asianhookers.com with sesamestreet.com and depending on some unknown formula decides which domain is the actual owner of the page. So that if the formula decides that asianhookers.com "owns" the pages on sesamestreet.com (because of the redirect) then when google is building their links they print, in the text of the page, sesamestreet.com, but in the html of the page they actually link to asianhookers.com.
It's fucked up.
Re:Robot.txt by catalina · 2005-03-23 04:13 · Score: 5, Funny

.....and get AsianHookers.com, etc.

couldn't you have made that a link so I can just click on it?
Re:Robot.txt by bill_mcgonigle · 2005-03-23 04:17 · Score: 1

Site A can return a 302 HTTP redirect to site B when Googlebot crawls their site. The googlebot will then index site B as site A. Site A could have no affiliation whatsoever with Site B; people could be clicking on SesameStreet.com and get AsianHookers.com, etc.

Isn't the fix then to provide preference to the real URL over 'copies' when culling duplicate data and/or pageranking the results? This seems easy, so the problem must be that Google isn't storing HTTP response codes with their page indexes such that it can make that decision. If they had it, 302's would lose to 200's and the problem would go away.

I suspect there's already a new version of googlebot crawling the web and storing result codes. Trouble is they have to recrawl the whole web before it's fixed.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:Robot.txt by Anonymous Coward · 2005-03-23 04:19 · Score: 0

I don't get how is this functionally different from any other non user intended redirect ala metrafresh, javascript mouseover etc etc?
Re:Robot.txt by Anonymous Coward · 2005-03-23 04:22 · Score: 0

then sesamestreet would have to be configured to 302 to asian hookers?
Re:Robot.txt by nametaken · 2005-03-23 04:25 · Score: 1

It was a bit confusing, but apparently this problem has existed in the search engines for years. As I understand it, Site A has a page that does redirects to other sites (usually for legitimate purposes). If you're one of the sites it redirects you to (Site B), google will list your content from the target url with Site A's redirecting page as the url. Then when someone clicks the link on the listing for your site they get the redirect script of Site A. At this point that script COULD contain any kind of craziness.

It seems the best anyone can do about this is to contact google and complain about "302 redirect hijacking".

http://www.google.com/intl/en/contact/security.h tm l

Like I said, most articles I'm finding say the problem has existed for years. There are some suggestions for trying to prevent this from happening to you, or trying to deal with a hijack scenario.

http://clsc.net/research/google-302-page-hijack. ht m#002
Re:Robot.txt by arkanes · 2005-03-23 04:25 · Score: 5, Informative

One problem is that people use 302s when they should be using 301s, like directory sites. No doubt this is because they want to get referral counts up.
A 302 is a "temporary redirect". Basically, it says that the content normally lives at the URL you requested but that, just this once, you should look at this other URL for the content. Googles response to a 302 is actually very reasonable. I suppose the best thing they could do is just not follow 302s.
A 301 is a permanent redirect, indicating that the page isn't at the original URL and that all future requests should be made to the new one. I don't know what Googlebot does in this case but I assume it discards the original URL, which is what the standard recommends.
Re:Robot.txt by lowrydr310 · 2005-03-23 04:27 · Score: 1

What exactly is the problem here? As long as they're linking to Asian pr0n, all is good.
Re:Robot.txt by KillerDeathRobot · 2005-03-23 04:31 · Score: 1

A million may be no harder than four to hijack, but a million dummy sites that would actually fool people is much harder than four.

--
Thinkin' Lincoln - a web comic of presidential proportions
Re:Robot.txt by PornMaster · 2005-03-23 04:43 · Score: 4, Informative

A million may be no harder than four to hijack, but a million dummy sites that would actually fool people is much harder than four.

This isn't about fooling people, it's about fooling a flawed technology to get false listings in the search engine results pages. It's about getting a lot of traffic. Yes, some people will be really pissed off when they get redirected to an affiliate program or something of the sort, but some small percentage of people will buy. If the cost to bring in a million visitors is miniscule because you're stealing search engine placement, and you get 50 people to sign up to something that pays you $50 a person, then you're up $2500 minus your hosting costs.

$2500 to someone in Malaysia is a lot of dough for a little coding... they could work for $200/mo in some kind of outsourcing plan or make a year's wages in their spare time. What do you think they're going to do?

--
500GB of disk, 5TB of transfer, $5.95/mo
Re:Robot.txt by northcat · 2005-03-23 05:48 · Score: 4, Informative

This is more like one site hijacking the ranking of another site. Suppose you're Ferrari and I'm the hijacker. You have ferrari.com and I have irule.com. Since you're ferrari.com you get very high rankings when people search for "ferrari" on Google. You're probably the first site displayed. And in the results page on Google, it displays a summary probably like "the official home page of ferrari cars". On my website I set up a 302 redirect to your website. It means, when someone visits my irule.com, they get redirected to ferrari.com. I don't do anything to your website, I don't have access to your website. I hope you know that Google indexes web pages by visiting those webpages with the user agent string "googlebot" and, of course, Google's IPs which are known to people. When Google sees that my page is 302 redirecting to ferrari.com, for certain reasons, it replaces ferrari.com in its index with irule.com. So when someone searches for "ferrari" the get irule.com as the first result instead of ferrari.com, and the summary still says "the official home page of ferrari cars". Now, I only 302 redirect irule.com to ferrari.com when googlebot visits my page. When anyone else visits irule.com, I give them something else, probably lots of ads, or I redirect them to some other site like LotsOfSmut.com. So I'm "hijacking" any references to ferrari.com on Google and its ranking. And when someone searches for "cars", instead of ferrari.com as the ninth result, irule.com is displayed. So... I profit (you do the math).

(Sorry for dumbing down my post so much, too much experience explaining things to my grand mother)
Re:Robot.txt by ReverendLoki · 2005-03-23 05:54 · Score: 4, Informative

The key is that they are using a 302 redirect, which is used to signify that the redirect is temporary only. In a completely honest and trustworthy Internet, this is used to indicate that for whatever reason (HW failure, slashdotting, etc), the requested pages were temporarily unavailable on the main site and were being hosted elsewhere until the issue can be resolved. This is telling Google et al that the content being redirected to (Sesame Street, for example) is normally hosted on the redirecting site (Asianhookers). From then on, whenever Google returns the result of the Sesame Street pae, it is listed with the URL pointing to the Asianhookers page. It does this under the assumption that once the issue requiring the redirect is resolved, people will want to go to the "original" page, and will still be redirected to the content in the meantime.
Aside from a filter on Google's end to resolve this, it would be nice if the practice of using 302 redirects also included a means of confirmation of the setup on the site being redirected to. If the site actually hosting the data does not in some way confirm the redirection, either through a tag in the header of the html, or perhaps in a third, predictably place file (much like a robots.txt file). Of course, this would first require te standard to be rewritten, and then would require people to actually abide by it.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Re:Robot.txt by Pxtl · 2005-03-23 06:15 · Score: 1

Actually, the ones I hate and keep running into are those god-freaking-awful "search" sites. They're not even a scam/phish/copy - they're just a really shitty search page with a lot of ads.
Re:Robot.txt by t0ny747 · 2005-03-23 06:47 · Score: 1

I suppose the best thing they could do is just not follow 302s.

They should only follow 302s in the same domain.

--
Taco?
Re:Robot.txt by TiggertheMad · 2005-03-23 06:56 · Score: 1

$2500 to someone in Malaysia is a lot of dough for a little coding... they could work for $200/mo in some kind of outsourcing plan or make a year's wages in their spare time. What do you think they're going to do?

You're probably right, but what happens to their domain when Google fixes the problem, and the permanently blacklists it. No problem if you have some cruddy name like, 'MalaysianWombatpr0n.org', but it would be pretty stupid if you redirected to a domain name like, 'Sex.com'.

One other thought, I'm sure that there are spammers all over the world racing to swipe as many top spots as possible right now, but I wouldn't do this in Malaysia. If I recall correctly, they have DRACONIAN laws when it comes to criminal activity. Personally I think getting canned for Google page ranking scams would suck. I would make 100% sure that this isn't even remotely illegal before I tried it from Malaysian jourisdiction.

--

HA! I just wasted some of your bandwidth with a frivolous sig!
Re:Robot.txt by cinnamon+colbert · 2005-03-23 07:22 · Score: 1

no reason at all apologize dumbing down - in fact, the first post on thematter that made sense, and the first poster with enough intelligence to explain it properly.

simple is NOT stupid - simple is smart, or as Einstein said, any scientist who could not explain himself to a milkmaid did not understand what he was working on
OR
Re:Robot.txt by HEXAN · 2005-03-23 07:44 · Score: 1, Insightful

The change is simple and very marginal. Don't follow a 302 to a different TLD. 1. Scammer at iblow.com 302's to ferrari.com 2. Googlebot indexes iblow.com and rec's 302 3. Googlebot refused to 302 outside iblow.com 4. Googlebot continues to next page on iblow.com 5. iblow.com 302 placed in /dev/null
Re:Robot.txt by SpecBear · 2005-03-23 07:58 · Score: 1

I did some poking around when this 302 stuff was first mentioned on slashdot, and I was quite chargined to realize that I'd accidentally done this to one of my own sites. Here's what happened:

I have a personal site that used to be on a free web host, but I later moved to my own machine under my own domain. I replaced the index page on the old site with a redirect. So the site used to be at http://freehost.tld/mysite, and now redirects to http://mysite.tld. It's been set up like this for a while.

If you google for the site, http://mysite.tld is nowhere to be seen. The top hit is for http://freehost.tld/mysite, but the preview that Google displays contains the content from http://mysite.tld. By using the redirect, my old site has effectively hijacked the page rank of the new site, and bumped the proper URL out of Google's index. And I did this purely by accident.

This by itself isn't very useful for a scammer. The above situation isn't that big a deal since anyone who clicks still winds up at the proper site. But if a scammer were to set up such a redirect, and then change the redirect based on the user-agent (GoogleBot gets the hijacked site, everyone else gets pr0n), and you now have the large-scale resurrection of search engine spoofing spam.
Re:Robot.txt by ToddBox · 2005-03-23 08:01 · Score: 2, Funny

You must be talking about Yahoo.
Re:Robot.txt by budgenator · 2005-03-23 08:01 · Score: 1

So far only one person has posted an example of this being done, and it wasn't, an example the first time I check, in fact the example ranked 1st the first time, 2nd the second time and 1st the third time so appearently google's result page rank changes by the minute.
My best guess is the "severity" of the problem is mainly an Urban Myth amongst search engine spammers sort of like the good old "Bic butane lighter + welder spark = 15KT explosion". The other funny thing is that all of the supposed 302er sites on google have an actual clickable link to the complainers site.
It appears that he is basicaly complaining that his paid advertiser's are getting a higher page rank on google than the page he's paid them to advertise is

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Re:Robot.txt by Anonymous Coward · 2005-03-23 08:07 · Score: 0

Hmm... Good to know asianhookers.com are on our side!
Re:Robot.txt by Anonymous Coward · 2005-03-23 08:11 · Score: 0

"(Sorry for dumbing down my post so much, too much experience explaining things to my grand mother)"

You must have one very knowledgable grandmother; I barely understood the explaination. Either that, or she just nods her head politely when you talk, and then goes back to knitting.
Re:Robot.txt by northcat · 2005-03-23 08:59 · Score: 1

(I think you mean domain, not TLD) Have a look at this thread.
Re:Robot.txt by geminidomino · 2005-03-26 03:48 · Score: 1

If you use firefox, you can select the domain name and middle-click (at least on the Linux version) and it will open for you.

Too bad I can't make it open in a new tab tho.

Good by Anonymous Coward · 2005-03-23 03:41 · Score: 0, Flamebait

The Google index has been a mess the last 6 months to a year. If something like this forces them to get their act together and stop ranking garbage at the top, good.

I've had it with Google! by Trolling4Columbine · 2005-03-23 03:41 · Score: 5, Funny

This is the last straw! I'm going back to MSN, where I know that my data and privacy are being protected!!

*duck*

--
Socialism: A feeling of discontent and resentment caused by a desire for the possessions or qualities of another.

Re:I've had it with Google! by 33degrees · 2005-03-23 04:02 · Score: 2, Informative

I know you were trying to make a joke, but if you'd RTFA you would know that MSN is as susceptible to this as Google is. Only Yahoo has addressed the issue.
Re:I've had it with Google! by vidarlo · 2005-03-23 04:16 · Score: 1

This is the last straw! I'm going back to MSN, where I know that my data and privacy are being protected!!
Bad luck! The only one of the major players not struck by this is yahoo, according to http://clsc.net/research/google-302-page-hijack.ht m: Search engines vulnerable to this exploit have been reported to include Google and MSN Search, probably others as well.

It seems pretty difficult to protect against, but one technique would be to ignore redirects, i.e only index 200 OK pages, not 302 Redirect and such. This would not be a big problem, but sites on the move could fall out of google for a month while google was crawling. Another thing would be to fake UA-string for the search engine bots, but still abide robots.txt. If google cross-checked 302-pages with a UA-string of i.e firefox, they could test if it was a true 302 or a falsified, for the crawlers.

The article says nothing about why yahoo is immune...:(

--
Assembling etherkillers for fun an profit
Re:I've had it with Google! by Jon+Peterson · 2005-03-23 04:27 · Score: 1

Messing with the user agent string won't work, as googlebots use a well known IP range, and you can ID google like this. It's how we do, and of course it protects you from people _pretending_ to be googlebot but changing their UA string...

The other problem is that many legit sites, including my own, use 302 redirects a lot. They are used a great deal for maintaining backward compatability with old links when URL schemes change. They are also very useful in authentication and personalisation siutations

--
----- .sig: file not found
Re:I've had it with Google! by recursiv · 2005-03-23 04:47 · Score: 1

That's why only cross-domain 302s should be considered bogus.

--
I used to bulls-eye womp-rats in my pants
Re:I've had it with Google! by boy_of_the_hash · 2005-03-23 04:49 · Score: 2, Interesting

Except that you should be using 301 when your URI scheme changes.
Re:I've had it with Google! by Waffle+Iron · 2005-03-23 04:57 · Score: 2, Informative

MSN is as susceptible to this as Google is.
That's only because Microsoft dropped the original vision of MSN, which was a closed centrally controlled service like a glorified BBS. When it was introduced, they planned to leverage their desktop dominance to get the entire world to subscribe to their proprietary network.
The original MSN user interface that was bundled with Windows 95 looked more like the Windows file manager than a browser. I imagine that if MSN had continued down that path, then searching for information would today look more like some versions of the MSDN library help browser (based on a manually controlled central index) than like Google.
As it turned out, people preferred the freedom offered by the real Internet, and their plans never panned out.
Re:I've had it with Google! by ralphdaugherty · 2005-03-23 04:57 · Score: 1

The other problem is that many legit sites, including my own, use 302 redirects a lot. They are used a great deal for maintaining backward compatability with old links when URL schemes change. They are also very useful in authentication and personalisation situations.

I asked the question in another post and now see your explanation of legitimate use of 302's. Do the 302's redirect to pages that contain content that you would want in Google and won't be indexed otherwise?

For example, if old pages in a changed URL the content probably have already been indexed. And personalization doesn't sound like googable content. Any thoughts on that?

rd
Re:I've had it with Google! by mike5904 · 2005-03-23 05:52 · Score: 1

Couldn't Google just follow the redirect, index the content on the redirected page, and index the URL of the redirected page too? That way a scammer trying to exploit this would find that the page they listed ends up getting ignored entirely (which might make sense for someone legitimately using a 302 as well, where they want people to use the redirected URL in the future and not the original one).
Re:I've had it with Google! by northcat · 2005-03-23 05:54 · Score: 1

Sorry if I'm missing your joke, but MSN search is also "vulnerable" to this.
Re:I've had it with Google! by ralphdaugherty · 2005-03-23 06:43 · Score: 1

Couldn't Google just follow the redirect, index the content on the redirected page, and index the URL of the redirected page too? That way a scammer trying to exploit this would find that the page they listed ends up getting ignored entirely (which might make sense for someone legitimately using a 302 as well, where they want people to use the redirected URL in the future and not the original one).

I agree. Who cares if the specs say the intent is for it to be temporary? You're indexing what exists. Googlebot will keep visiting that url and either keep getting a redirect or the temporary redirect is gone.

After awhile, if it isn't temporary, they'll stop visiting the original url. End of hijacker being in google altogether.

rd
Re:I've had it with Google! by theridersofrohan · 2005-03-23 07:32 · Score: 1

This is the last straw! I'm going back to MSN, where I know that my data and privacy are being protected!!

*duck*

duck?

*fish!*
Re:I've had it with Google! by Anonymous Coward · 2005-03-23 09:56 · Score: 0

The article says nothing about why yahoo is immune...:(

Have you seen the mascot of their OS (FreeBSD) lately? He may look cute, but that's just too lull you into a false sense of security, that lil daemon chuck doesn't mess around. So there you go, immunity via daemonic powers.
Re:I've had it with Google! by cokemaster · 2005-03-23 11:58 · Score: 1

"I know you were trying to make a joke, but if you'd RTFA you would know that MSN is as susceptible to this as Google is. Only Yahoo has addressed the issue." I find your lack of faith disturbing, its a feature not a bug!
Re:I've had it with Google! by inKubus · 2005-03-23 14:20 · Score: 1

Yeah, thank god Google is a private company run by PhD's and not a publically traded company just trying to make a profit.

--
Cool! Amazing Toys.
Re:I've had it with Google! by WNight · 2005-03-23 18:41 · Score: 1

I don't see how the word "protect" fits here. Just about anything you could do differently between me and google would be like the 302 "bug".

Easy to prosecute, hmmm? by r00t · 2005-03-23 03:41 · Score: 4, Interesting

Google has the records, and probably the original
site exists with behavior dependent on browser name
being GoogleBot or not. The replacement site will
generally have some way of making money, which can
be tracked via financial transactions.

Re:Easy to prosecute, hmmm? by fourtyfive · 2005-03-23 03:55 · Score: 1

Actually the Behavious can be changed wether the Name is GoogleBot or something else, IE, you can make it redirect for ONLY GoogleBot, and nothing else.
Re:Easy to prosecute, hmmm? by jridley · 2005-03-23 03:59 · Score: 4, Insightful

Prosecute for what? Is there a law against redirecting web pages? I think this would be a pretty difficult prosecution. Google's going to have to take technical steps on this one.
Re:Easy to prosecute, hmmm? by Donny+Smith · 2005-03-23 05:41 · Score: 1

> Google has the records,

Indeed, that should make it easy for someone to prosecute Google (for unauthorized caching like the French did earlier this week) :-)
Re:Easy to prosecute, hmmm? by thogard · 2005-03-23 12:16 · Score: 1

If your browser allows it, browse the web as the googlebot. It gets rid of lots of pesky problems involving logins and useless signups.

Law of the Internet by Cytlid · 2005-03-23 03:43 · Score: 5, Insightful

For every Good Thing, there are at least 100 different ways to abuse it.

--
FLR

Re:Law of the Internet by roror · 2005-03-23 06:45 · Score: 1

Your statement has more validity than you realize. It's not only internet, everywhere one sees this again and again. e.g. Nuke was supposed to be a good thing too.

302 by auralrothko · 2005-03-23 03:43 · Score: 5, Informative

I wasn't sure what a 302 hijack was, so here's the obligatory lowdown for those who didn't rtfa (from article linked page) This exploit allows any webmaster to have his own "virtual pages" rank for terms that pages belonging to another webmaster used to rank for. Successfully employed, this technique will allow the offending webmaster ("the hijacker") to displace the pages of the "target" in the Search Engine Results Pages ("SERPS"), and hence (a) cause search engine traffic to the target website to vanish, and/or (b) further redirect traffic to any other page of choice.

--
arg

Re:302 by SassyDave · 2005-03-23 03:52 · Score: 5, Informative

For the full details of the exploit, TFA gives a pretty decent recipe:

The technical part: How it is done Here is the full recipe with every step outlined. It's extremely simplified to benefit non-tech readers, and hence not 100% accurate in the finer details, but even though I really have tried to keep it simple you may want to read it twice: 1. Googlebot (the "web spider" that Google uses to harvest pages) visits a page with a redirect script. In this example it is a link that redirects to another page using a click tracker script, but it need not be so. That page is the "hijacking" page, or "offending" page. 2. This click tracker script issues a server response code "302 Found" when the link is clicked. This response code is the important part; it does not need to be caused by a click tracker script. Most webmaster tools use this response code per default, as it is standard in both ASP and PHP. 3. Googlebot indexes the content and makes a list of the links on the hijacker page (including one or more links that are really a redirect script) 4. All the links on the hijacker page are sent to a database for storage until another Googlebot is ready to spider them. At this point the connection breaks between your site and the hijacker page, so you (as webmaster) can do nothing about the following: 5. Some other Googlebot tries one of these links - this one happens to be the redirect script (Google has thousands of spiders, all are called "Googlebot") 6. It receives a "302 Found" status code and goes "yummy, here's a nice new page for me" 7. It then receives a "Location: www.your-domain.tld" header and hurries to your page to get the content. 8. It heads straight to your page without telling your server on what page it found the link it used to get there (as, obviously, it doesn't know - another Googlebot fetched it) 9. It has the URL of the redirect script (which is the link it was given, not the page that link was on), so now it indexes your content as belonging to that URL. 10. It deliberately chooses to keep the redirect URL, as the redirect script has just told it that the new location (That is: The target URL, or your web page) is just a temporary location for the content. That's what 302 means: Temporary location for content. 11. Bingo, a brand new page is created (never mind that it does not exist IRL, to Googlebot it does) 12. Some other Googlebot finds your page at your right URL and indexes it. 13. When both pages arrive at the reception of the "index" they are spotted by the "duplicate filter" as it is discovered that they are identical. 14. The "duplicate filter" doesn't know that one of these pages is not a page but just a link (to a script). It has two URLs and identical content, so this is a piece of cake: Let the best page win. The other disappears. 15. Optional: For mischievous webmasters only: For any other visitor than "Googlebot", make the redirect script point to any other page free of choice.
Re:302 by ari_j · 2005-03-23 03:53 · Score: 2, Interesting

I'm still not seeing any explanation of how it works, only what happens when it does work.
Re:302 by windowpain · 2005-03-23 03:55 · Score: 1, Informative

Thanks. Both the /. article and the linked story were utterly uninformative. Sometimes it seems that a lot techies disdain even the merest explanation as baby talk. Even when you're addressing a largely technical audience a little explanation helps because not everybody knows every technical detail about an entire field.

--
Insert witty sig here.
Re:302 by Qzukk · 2005-03-23 04:08 · Score: 1

So let me get this straight... If I have www.crappywebsite.com and I want to pump up its pagerank, all I need to do is have "www.crappywebsite.com" redirect googlebot to www.cnn.com, and suddenly www.crappywebsite.com is a font of highly-ranked information?

The REAL answer would be to have google not index redirects (which is pretty stupid, all things considered. Why link searchers to the "wrong" URL, instead of the destination URL of the redirect?)

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:302 by StrongAxe · 2005-03-23 04:19 · Score: 5, Informative

I'm still not seeing any explanation of how it works, only what happens when it does work. 1. Phisher creates (say) cïtïcorp.com and makes the home page redirect to the real citicorp.com page. 2. Googlebot browses cïtïcorp.com and gets a redirect to the real citicorp.com, and indexes its contents 3. User does a Google search looking for Citicorp, and finds cïtïcorp.com page that appears to contain the valid data (and it might be the only such page, if the legitimate page gets removed through the duplicate-removal process) 4. User clicks through to cïtïcorp.com expecting to see the valid web page 5. Phisher's server sees that the request is not from a Googlebot, so it serves up a fake page rather than redirecting to the legitimate real one. 6. User believes he is at the real citicorp.com web site, when he is in fact at the bogus cïtïcorp.com website, legitimized by Google. 7. Identity theft. 8. Profit. (OB. Slashdot joke.)
Re:302 by Anonymous Coward · 2005-03-23 04:26 · Score: 2, Informative

This is where the concept of the 302 comes in. 302 means "I'm redirecting you to a temporary home of crappywebsite.com, located at cnn.com. However, this is subject to change, so you should continue to use crappywebsite.com to find the news content you were looking for". This is, of course, horseshit, but the bot believes it. The bot then happily goes on to index all the information on cnn.com, and links it to crappywebsite.com (the "permanent" URL according to the 302).
Re:302 by xeer · 2005-03-23 04:34 · Score: 1

I'm still not convinced.

If it was a problem, then a search for camera hacks should return the go.php from my recent blog entry about that site.
I use 302s all over the place when linking to external sites you see.
The first link of that search should be http://blogs.linux.ie/xeer/go.php?http://camerahac ks.com/ if the 302 redirects really stole content! Even searching for go.php camera hacks only returns one link: a post where I was discussing referer spam.
I don't think it's a problem.

What is a real problem, is websites showing googlebot a plain page full of keywords and then when a real visitor visits, redirect them to an ad clicker. I came across that this morning!
Re:302 by arkanes · 2005-03-23 04:38 · Score: 1

Because a 302 is a temporary redirect - it's supposed to indicate that you have the correct URL, but the content is temporarily hosted elsewhere.
Re:302 by Anonymous Coward · 2005-03-23 04:42 · Score: 0

This is an obvious case of non-trustworthy information. A site which has low credibility says it's the real source for information on a different site. Why should Google believe this? They don't trust meta-keywords because they likewise are just not trustworthy. Google needs to fix this, and fast. Just treat 302 like 301 or ignore 302s altogether. I've seen this trick in action and it's unbelievable that Google falls for something this stupid.
Re:302 by Anonymous Coward · 2005-03-23 04:45 · Score: 0

I use 302s all over the place when linking to external sites you see.
Shouldn't you be using 301s?
Re:302 by arkanes · 2005-03-23 04:46 · Score: 1

You're using 302s incorrectly. Use a 301. Or better yet, just use a link instead of a redirect. All thes e referral/go/whatever pages are a pain in the ass.
Re:302 by ari_j · 2005-03-23 04:49 · Score: 3, Insightful

Thanks. And remember, identitiy theft is not a joke, unless you steal the identity of a clown.
Re:302 by Ryan+Stortz · 2005-03-23 06:33 · Score: 4, Interesting

I think a resonable solution to this would be for Google to send a second spider to the site for every 302 Redirect they find, with a user-agent indicating its IE or any other browser. Then compare the data.

Although, they could probably still figure out it's google by their IP, but it's a step in the right direction.

--
Bugs are just features that have been fixed.
Re:302 by yulek · 2005-03-23 08:48 · Score: 1

The REAL answer would be to have google not index redirects (which is pretty stupid, all things considered. Why link searchers to the "wrong" URL, instead of the destination URL of the redirect?)

not that simple. 302's are commonly used when you want to establish a session id necessary for navigating some websites, like amazon, ebay, etc. there's also technologies like aspx, php, various wiki implementations (just a few examples) that rely on 302 in their architecture.

--
in this age of communication i'm just not getting through
Re:302 by Anonymous Coward · 2005-03-23 09:33 · Score: 0

Right, but that session id (cookie, anyway) is only valid on a single domain, so we can say "302's pointing to another domain should be ignored".
Re:302 by thogard · 2005-03-23 12:26 · Score: 1

If its temporary, google has no idea how short or long term that will be so it should purge the temporary site out of its indexes and then add the other site to its list of sites to index if its not already there. Google trashes their complete index quite frequently so trashing temporary redirects isn't a problem at all. The best thing for google to do would be take the 302's and assign that page a negative page rank and let the google spamers find a new way to abuse the system.
Re:302 by yulek · 2005-03-23 13:02 · Score: 1

true. i can't think of too many areas where 302 to a different domain is useful except as a tombstone.

--
in this age of communication i'm just not getting through
Re:302 by yulek · 2005-03-23 13:05 · Score: 1

true. i can't think of too many areas where 302 to a different domain is useful except as a tombstone.

err... in which case it should be a 301 anyway...

--
in this age of communication i'm just not getting through

Thank you. by 2names · 2005-03-23 03:44 · Score: 1

see subject.

--
"I'm just here to regulate funkiness."

301 redirects by Anonymous Coward · 2005-03-23 03:45 · Score: 3, Interesting

A few months ago, I rearranged my website. To make sure people could still find things, I put 301 redirects on all the old pages that I moved.

I noticed in my logs that search engines have repeatedly requested the 301 pages, but often don't follow the links to the new pages. And when searched with google, the pages still show up with the old urls. Should I be using 302 redirects instead?

Re:301 redirects by JaseOne · 2005-03-23 03:55 · Score: 0

301 is a temporary redirect so the seach engines probably ignore it but a 302 is a permanent redirect so yes you should use 302's.

--
In the drops - An Aussie's musings on all things cycling
Re:301 redirects by Anonymous Coward · 2005-03-23 12:12 · Score: 0

In Reply to

>> JaseOne (579683) on Wednesday March 23, @10:55AM (#12024308)

Your information is wrong. It is the other way around.

301 is permanent
302 is temporary
Re:301 redirects by Anonymous Coward · 2005-03-23 12:52 · Score: 0

>> I noticed in my logs that search engines have repeatedly requested the 301 pages, but often don't follow the links to the new pages.

No. They add the URLs to a database and crawl them later. The crawler does not leave any referrer information in your log.

Why? by dep01 · 2005-03-23 03:46 · Score: 2, Insightful

Why is it seemingly man's mission to "bring down" something that seems to provide such a great service for everyone?

"Oh! Look! Something beautiful! Something impressive! I must destroy it!"

pah. feeling jaded today, i guess.

--
"hey, could you pass me a paper towel? er.. I mean... DEPLOY ABSORBTION PANEL!"

Re:Why? by bratboy · 2005-03-23 04:17 · Score: 1

The standard response to constructive criticism - attack the messenger. The effective response - realize that indifference, obsolescence, and hubris are your enemies, not the people who are trying to improve your service.
Closed source works on the first principle ("nope, no bugs or security holes here..."), OS works on the second ("you found a bug? great! why haven't you fixed it yet?").
Re:Why? by Anonymous Coward · 2005-03-23 04:17 · Score: 0

that's been your mo since roman days and before! i think it has something to do with still being stuck, mentally, on the other side of the caucasus and trying to make everything look like home.
Re:Why? by a16 · 2005-03-23 04:18 · Score: 2, Insightful

In this case, it's more a case of "I must make money from it".

The people using this exploit to get fake listings (just like all of the spam pages we see in search engines) aren't doing it for the fun of it.
Re:Why? by xsbellx · 2005-03-23 04:20 · Score: 1

Well, the obvious answer can be parphrased from Dune, "The ultimate control of something is the ability to destroy it". The more subtile answer deals with our species desire for "more".

In a far off time, the Internet was a wonderfull place devoid of such mundane things as commerce. Now, fastforwarding a few years to the present, people are making significant sums of money off of the internet selling "products". One of the best to get somebody to buy something is to make them aware of a "need" they have for your product. This is normally accomplished through such methods as a sales force and marketing. A basic fundamental of marketing is, "get them in the door and somebody will see something they need or at least think they need". With the internet, the goal is the same "get as many as possible to view the site and at least a few will buy". So what better way to get somebody to visit your store than to have a "trusted", third-party tell you what you are looking for can be found at this place?

--
If VISTA is the answer, you didn't understand the question
Re:Why? by dep01 · 2005-03-23 04:54 · Score: 1

Ah, I guess it is the American way to find something popular or successful and latch onto it like those little fish that latch on to sharks.

--
"hey, could you pass me a paper towel? er.. I mean... DEPLOY ABSORBTION PANEL!"
Re:Why? by northcat · 2005-03-23 06:08 · Score: 1

They're making money. Immorally. Legally (probably). Are you sure this is the only immoral way people are making money right now? Are you sure you've never benifited, directly or indirectly, but voluntarily and knowingly, from an immoral way of making money, however small the immorality was and however small your benifit was?
Re:Why? by Anonymous Coward · 2005-03-23 07:00 · Score: 0

They're making money. Immorally. Legally (probably).
Legally? This technique probably infringes trademarks, since (a) deception is used to (b) make profit from the fame of other people's marks. Good luck fighting the jurisdictional issues though.
What I don't get is why Google has not fixed this. They used to be competent, agile, cool folks. Doing the wrong thing, and keeping it up for months and months for no reason even while it hurts good people is Not Nice.
Re:Why? by Anonymous Coward · 2005-03-23 07:21 · Score: 0

"Oh! Look! Something beautiful! Something impressive! I must destroy it!"

Similar story with picking wild flowers. Or the ones in your neighbor's garden. :)
Re:Why? by a_random_geek · 2005-03-23 08:13 · Score: 1

He's probably a deer hunter.
Re:Why? by Anonymous Coward · 2005-03-23 10:44 · Score: 0

Something beatiful like Windows...And everybody makin lots of viruses just only to destroy it....

Do what I'm going to do... by Not_Wiggins · 2005-03-23 03:46 · Score: 4, Insightful

buy GOOG on the dip as many non-techie investors panic sell. 8)

--
Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.

Re:Do what I'm going to do... by ceejayoz · 2005-03-23 03:54 · Score: 2, Funny

Yeah, 'cause the non-techie investors read Slashdot...
Re:Do what I'm going to do... by Anonymous Coward · 2005-03-23 04:01 · Score: 0

Yeah, because GOOG with a PE of 124 is such a bargain.
Re:Do what I'm going to do... by 2short · 2005-03-23 04:10 · Score: 1

Right, as long as Google is priced right now, and not insanely overblown. I don't have any idea what their stock (or more to the point their price/earnings) is at; but I know which way I'd bet.

Free investment tip: Avoid buying stock in any company if an unsophisticated investor, for reasons unrelated to profitability, would think that company is Way Cool.

It appears Google has a sound business plan and competent management. Which probably justifies some particular, perfectly healthy stock price. But I'm guessing their stock price has to be at least twice that, because they are also, quite legitimately, Way Cool.
Re:Do what I'm going to do... by SA+Stevens · 2005-03-23 04:11 · Score: 1

There sure as heck aren't many tech-literate people here these days.

I mean, comeon. The old days are gone. It's about colored lights in a tricked out see-through case, in todays 'Phillips Screwdriver Pilot' techno world.
Re:Do what I'm going to do... by BasilBrush · 2005-03-23 04:30 · Score: 1

Free investment tip: Avoid buying stock in any company if an unsophisticated investor, for reasons unrelated to profitability, would think that company is Way Cool.
Free investment tip (advanced): Buy stock in any company that the unsophisticated investor *will* find way cool when they catch up with what's actually happening. I didn't go far wrong buying AAPL when iPod was just starting to take off.
Re:Do what I'm going to do... by greg1104 · 2005-03-23 04:39 · Score: 2, Interesting

P/E ratios are a poor way to compare stocks in new growth companies, as they don't account for the rate at which earnings growth is accelerating which is far more important than the current earnings amount. P/E looks back, not forward.

If you look at the earnings rating that Investor's Business Daily computes, GOOG gets a score of 99, the maximum possible. I quote from IBD founder Bill O'Neil's book "How To Make Money in Stocks" to point out show short-sighted P/E thinking is with companies like this, picking one example most here are familiar with.

"American Online sold for over 100 times earnings in November 1994 before increasing 14,900% from 1994 to its top in December 1999."

I've made plenty of money buying companies with a P/E of near 100 and selling as it hit 500+. That said, I still wouldn't buy GOOG, but it's more because the market cap extrapolated from the relatively small number of public shares seems insane.
Re:Do what I'm going to do... by 2short · 2005-03-23 04:44 · Score: 1

Sure, but that doesn't really apply to Google; everyone thinks they are way cool already. As for Apple, the success of the iPod provides perfectly good reasons for sophisticated investors to think they are cool.
In any case, the really basic investment tip to keep in mind is: If you think you can predict what stocks will do better than professional institutional investment managers with lots of experience and an extensive research staff, ask youself why you aren't one.
Re:Do what I'm going to do... by BasilBrush · 2005-03-23 04:51 · Score: 1

In any case, the really basic investment tip to keep in mind is: If you think you can predict what stocks will do better than professional institutional investment managers with lots of experience and an extensive research staff, ask youself why you aren't one.
Defeatist talk. Investment managers on average perform only about as well as the index. Investment managers don't actually have much concept of the technology that tech stock companies are hawking. And their vision of the tech future *may* be less than yours is. Only if you are the visionary type of geek of course.
Re:Do what I'm going to do... by BasilBrush · 2005-03-23 04:54 · Score: 1

P.S. Yes of course, way too late for GOOG. In fact the IPO was too late for GOOG from a "getting in before the average dude" perspective. Mind you, people that did invest in GOOG at the IPO have done well. 80% up in 7 months.
Re:Do what I'm going to do... by MegaFur · 2005-03-23 05:42 · Score: 1

nvestment managers don't actually have much concept of the technology that tech stock companies are hawking.

But the investment bankers don't necessarily have to understand the technology to adequately predict what the stock will do because what the technology actually does, and how advanced and cool it is, is only one factor in determining how the company will do.

Surely you're familiar with examples where a company had a great technology but died anyway? Yes of course the technology matters, but it's very often not the deciding factor in determining how well a company's going to do. Example: Microsoft.

--
Furry cows moo and decompress.
Re:Do what I'm going to do... by BasilBrush · 2005-03-23 05:57 · Score: 1

This is why the tip was marked "advanced". You need to understand the market as well as the technology.
Re:Do what I'm going to do... by 2short · 2005-03-23 06:39 · Score: 1

Realist talk. Everyone, on average, performs only as well as the index (obviously). If you can beat the index with any reliability, you should have no problem making a killing as an investment manager. If investment managers don't have much concept of the products the companies they invest in are hawking, they have the budget to hire advisers who do, and if they don't do that, they won't keep their jobs long.
Re:Do what I'm going to do... by BasilBrush · 2005-03-23 07:15 · Score: 1

You stick to your point of view. I'll stick to mine thanks. It's served me well.
Re:Do what I'm going to do... by HopeOS · 2005-03-23 08:02 · Score: 1

Actually, the grandparent post is valid. It takes a couple of days for technical news discussed on Slashdot and reported by the Register to trickle up to the less technically-savvy investors. On a whim, I took $1000 to just over $5000 in January and February of 2000 using this exact strategy. Unfortunately, with the bubble now popped, the tech industry is already depressed so it's not obvious what kind of response you'll see today. Still, panic is panic.

For what it's worth, I went on vacation using that money and when I came back all my other investments were tanked -- net loss all around. Long-term diversification strategies are a better bet. Day trading is still gambling after all.

-Hope

Web presence pressure by gitana · 2005-03-23 03:47 · Score: 5, Insightful

As web presence -defined as within about the first 10-20 results of a search- becomes more and more important to "success," black hat techniques such as this, to eliminate competitors, will become more and more common. Google, or any other search tool needs to be able to stay above the fray and not be subject to hacks such as this.

Re:Web presence pressure by filmmaker · 2005-03-23 05:48 · Score: 2, Insightful

Exactly. And if they'd just stop giving PageRank credit to the redirect destination, it'd all be over. In fact, the algorithm should check to see what the link density is between to disparate domains if it's going to even cache 302'ed content. Because in these scam cases, the perpetrator never has an inbound link from the victim domain and Google could "grade" this relationship as being very one-sided and not generally very trustworthy. The more interlinkages, the more trust. But assigning Pagerank on 302's is just nuts.

--
I Want To Believe
Re:Web presence pressure by northcat · 2005-03-23 06:11 · Score: 1

Yeah, these are malicious cyber-terrorists destroying the free world by "hacking" something they don't even have direct access to.

Gopher by one_i_blind · 2005-03-23 03:48 · Score: 5, Funny

This is why Gopher will always be better than your feable world wide web junk.

Re:Gopher by ari_j · 2005-03-23 03:51 · Score: 5, Funny

Dude - the single biggest difference between Gopher and the web is that Gopher contains far fewer spelling errors. I hear that there are differences regarding interactivity, graphics, layout, and so forth; but those are all immaterial.
Re:Gopher by stupidfoo · 2005-03-23 03:53 · Score: 0

University of Minnesota owns you!
Re:Gopher by one_i_blind · 2005-03-23 04:03 · Score: 0

OK, you win. I wasn't paying attention. I meant feeble.
Re:Gopher by Patrick+Mannion · 2005-03-23 04:04 · Score: 1

BACK TO THE HOLES! BACK TO THE HOLES!! I say we start Return to Gopher campagin and promote Gopher awareness!

--
In America, you spam computers In Soviet Russia, computers spam you!
Re:Gopher by ari_j · 2005-03-23 04:08 · Score: 2, Interesting

IE doesn't support gopher:// URLs any longer, so assume that demand for Gopher would drive market share of Firefox et al. The problem is driving the demand for Gopher when IE doesn't support it.
Re:Gopher by John+Hasler · 2005-03-23 04:14 · Score: 1

> IE doesn't support gopher:// URLs any longer...

An excellent reason to use Gopher.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Gopher by ajs · 2005-03-23 04:20 · Score: 1

Gopher is part of the World Wide Web, as are several other protocols that pre-date the Web. You meant to say, "This is why Gopher will always be better than an HTTP server."

The World Wide Web is the meta-index of (mostly) Internet-accessible content which can be addressed by URI (almost always more specifically by URL).

Since Gopher can be addressed via the URI scheme, "gopher", it's part of the Web.
Re:Gopher by Storlek · 2005-03-23 04:47 · Score: 1

Gopher will never come back. Doesn't have a catchy enough name.

Something like "Gooooooooopher", on the other hand...

--
Bears don't normally eat things that talk and move backwards.
Re:Gopher by Therlin · 2005-03-23 05:39 · Score: 1

When someone showed me a web page for the first time using a highly secure government computer, I said "oh, like Gopher with pictures, not a big deal."
Re:Gopher by conan776 · 2005-03-23 06:47 · Score: 1

Said the same dang thing, passing by the kiosk set up in the Undergrad library, the forms in hand to change my major from CS to English, as I figured nothing new was coming down the pike in programming. Doh!

--
"Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick

Wait... by dark-br · 2005-03-23 03:51 · Score: 5, Funny

Damn Google!!! Do you mean this is not www.kuro5hin.org ??

The super-slashdotting by kunkie · 2005-03-23 03:54 · Score: 5, Funny

I can imagine it now... The slashdotting to end all slashdots. If every site in google was 302 redirected to RIAA.com How amazing would that be...

Re:The super-slashdotting by Nessak · 2005-03-23 04:34 · Score: 2, Insightful

I think that is the RIAA wet dream -- to have every web page point to it. Don't they belive the only way to save music is to kill the web?
Re:The super-slashdotting by theTerribleRobbo · 2005-03-23 18:25 · Score: 1

The 'attack' would mean that the sites doing the redirecting would end up stealing the RIAA's ranking , rather the other way around like you seem to think.

ObSimpsons by Skater · 2005-03-23 03:55 · Score: 0

I told you, I have too much time on my hands!

exploits by Sv-Manowar · 2005-03-23 03:58 · Score: 1

People are already using the 403 redirect on services such as no-ip and dyndns so they can manage to gain multiple (even whole pages) of listings for the terms they want

--
Business Voyeur

How to check if your site is being hijacked... by ites · 2005-03-23 03:59 · Score: 4, Informative

1. search Google for 'allinurl:', e.g. 'allinurl:slashdot.org'.

2. copy and paste any dubious URLS into this tool and check whether they're using 302 redirects or not.

3. Panic! /me notices that my company's web site has been thusly hijacked... and yes! Doing a Google search on the main text on my company's web site shows dozens of unrelated sites high in the ranking. None of these actually have the text on their pages.

One example: http://www.tradedoubler.it.

Luckily, the phrase in question is complete gibberish and no-one ever finds our site through Google, only by reputation and word of mouth.

Still, I think it's clear Google have a serious problem here...

--
Sig for sale or rent. One previous user. Inquire within.

Re:How to check if your site is being hijacked... by Anonymous Coward · 2005-03-23 04:17 · Score: 0

yep, checked one of my more topical sites,
the 2nd URL listed on Google was a 302...
Re:How to check if your site is being hijacked... by boinger · 2005-03-23 05:04 · Score: 1

Mine, too. Why would anyone want to hijack a page full of "postcards" of people giving the finger?

--
Send your friends messages of love at fuck-you.org
Re:How to check if your site is being hijacked... by AGTiny · 2005-03-23 05:33 · Score: 1

Hmm, one of my websites report an invalid domain when I do this:
www.mysite.com.this

No description and the link obviously doesn't work. What the hell?
Re:How to check if your site is being hijacked... by Jugalator · 2005-03-23 07:59 · Score: 1

1. search Google for 'allinurl:', e.g. 'allinurl:slashdot.org'.

Jeez, now even Google is down! I got:

404 File Not Found
The requested URL (google.com) was not found.

If you feel like it, mail the url, and where ya came from to pater@slashdot.org.

I just can't figure out what Slashdot, or CowboyNeal to be more exact, has to do with this!

Tinfoil hat on!!!11

--
Beware: In C++, your friends can see your privates!

Google Cookie last until 2038! by NoSuchGuy · 2005-03-23 04:00 · Score: 1

You are right, MSN only sets cookies that last the lifetime of their current OS.

--
Grundgesetz * 23. Mai 1949 - 30. November 2007 - http://www.vorratsdatenspeicherung.de/

Re:Google Cookie last until 2038! by Oxy+the+moron · 2005-03-23 04:11 · Score: 5, Funny

Considering the timespan between Windows re-formats/re-installations, that isn't really all that unreasonable...

--
Proudly supporting the Libertarian Party.

Google is borked by stratjakt · 2005-03-23 04:00 · Score: 0, Offtopic

It just is.

It keeps getting harder and harder to find what I'm looking for. I'd say about 3/4s of the links it returns just redirect me to eBay or some assholes Amazon referrer link. If I wanted to deal with that type of sleazy spamming bullshit I'd just read a book review on /.

Are they doing anything to stop the googlebombing, scamming, and bullshit?

Because I see this bringing Google down eventually, if they dont. I mean, if their search doesnt work, what do they got? A cheesy freeware photo sorting app? Webmail?

--
I don't need no instructions to know how to rock!!!!

Re:Google is borked by Anonymous Coward · 2005-03-23 04:15 · Score: 1, Informative

So try Teoma instead. They're not as well known as Google but I find they return much more relevant results in many cases.
Re:Google is borked by SA+Stevens · 2005-03-23 04:17 · Score: 1

I say that it would be an appropriate end for the company that bought DejaNews and is continuously screwing with the useful Usenet archive tool that it once represented.

Then again, Deja.com 'the place for consumers to search for product info' was an abomination in the middle years before the Google takeover.

Why do all the good Internet resources gradually turn to shit?
Re:Google is borked by bananasfalklands · 2005-03-23 04:46 · Score: 1

Google seems to me have got itself into a corner. There are lots of pages of crap, but my non 302 site is not in its index. Google is playing the were smarter than the search engine optimizers game. The 'genuine' websites who have real information lose out. Google is loosing the plot, i wish to pay nobody, or advertise on google. I do have some yahoo pages indexed on google. I dont consider them important and to me prove how dumb there 'bot' is. When google stops playing the 'game' perhaps things might improve ? Google is not a definetive search in my eyes.

--
Send Peter Clifford Francis Macrae comdoms to 23 Bedford St, St.Neots, PE19 1AX, England
Re:Google is borked by BurntNickel · 2005-03-23 04:49 · Score: 1

Why do all the good Internet resources gradually turn to shit?

Because someone wants to make money from it.

--
And the knowledge that they fear is a weapon to be used against them...
Re:Google is borked by XeRo_X4i · 2005-03-23 09:42 · Score: 1

You're an idiot. Thats a part of Ask Jeeves, which we all know sucks, unless if you're too dumb to realize that you don't need to have a web search in question format.

--
XeRo

Good explanation about 302 hijacking by angio · 2005-03-23 04:02 · Score: 4, Informative

Someone posted a nice explanation of the phenomenon at webmasterworld.com.

302 hijacks work because Google goes to http://bad.site/ and gets redirected to http://good.site/. It then treats the contents of the bad.site as identical to that of good.site. The effect seems similar to if somebody simply copied an entire page off of your site (I'm not sure if it's actually more serious than this), but it's easier to do because you're just keeping a small table of redirections.

How serious is it? Don't know. It's pretty easy for a webmaster to check for hijacking and have her pages de-hijacked (see aforementioned article). It's probably not as screamingly awful as the threadwatch.org article suggests, but the redirector sites are rather annoying. Several of the comments in the webmaster article suggest that Google has already started moving on the problem.

Re:Good explanation about 302 hijacking by ralphdaugherty · 2005-03-23 04:33 · Score: 1

Is there anything here that is so useful Google should follow 302 redirects? After all, the explanations here (thanks) mentioned that googlebots will find the original good.site pages as well, of course if they are already an indexed site.

Why follow redirects at all?

rd
Re:Good explanation about 302 hijacking by Col.+Klink+(retired) · 2005-03-23 05:05 · Score: 2, Informative

> The effect seems similar to if somebody simply copied an entire page off of your site (I'm not sure if it's actually more serious than this), but it's easier to do because you're just keeping a small table of redirections.

The key here is that only googlebot is redirected. If you simply copied someone else's site, everyone would still get the info they were looking for. However, if you only redirect googlebot, you can redirect others to whatever you want.

--
-- Don't Tase me, bro!
Re:Good explanation about 302 hijacking by Anonymous Coward · 2005-03-23 05:10 · Score: 1, Informative

You can also serve different content to googlebot if you copy the whole page, so there's really not much of a difference. It's called cloaking: people have been doing that for ages.
Re:Good explanation about 302 hijacking by squiggleslash · 2005-03-23 05:23 · Score: 4, Informative

It's a little more than that. It's not just that bad.site is treated as identical to good.site, it's that good.site is potentially removed from Google. "302" means "temporary redirect", which gives Google the false idea that good.site isn't a permanent website.
Whether it actually removes good.site from the index has to do with, apparently, the PageRank of both sites.
It really wasn't until I read a full explanation and they covered that bit that the whole thing clicked for me.

--
You are not alone. This is not normal. None of this is normal.
Re:Good explanation about 302 hijacking by anthony_dipierro · 2005-03-23 13:44 · Score: 1

Good explanation. I see how this would be useful. After all, if I have a permanent url and point to various different temporary urls, you definitely can't display them all in a search query, there just isn't enough room in the first few results for such redundancy. If the pagerank of the "permanent url" is higher than that of the temporary one, then chances are no one is gaming the system. But what happens when someone does manage to get a higher pagerank than the targetted site?

Seems like something that's difficult to solve other than to make the pagerank algorithm as hard to fool as possible.
Re:Good explanation about 302 hijacking by Anonymous Coward · 2005-03-23 20:56 · Score: 0

The key here is that only googlebot is redirected.
Not necessarily. Suppose that:

1) www.widgets.com has a PageRank of 7

2) www.hijacker.com has a PageRank of 1

The webmaster of www.hijacker.com installs an index.php file with the following line of code:
header('Location: http://www.widgets.com');
Google comes around and spiders www.hijacker.com. They follow the redirect to http://www.widgets.com.

Here's where things go bad, part 1. Google saves www.hijacker.com with www.widgets.com's content and PageRank. As the Google index updates, people searching for, say, Widgets see a result that says, "Welcome to WidgetCo!" but if they click, it actually goes to http://www.hijacker.com.

Here's where things go bad, part 2. Google runs a duplicate content filter from time to time. This filter runs and notices that www.widgets.com and www.hijacker.com have identical content. One of them is a duplicate entry, but which one?

Google's patented filter says the one with the lower pagerank is the impostor... Except now, hijacker.com has a PageRank of 7, and widgets.com - having been hijacked - has a lower PageRank. widgets.com loses, and is erased completely from Google search results, even though they are the valid site!

This involves no cloaking, and nobody has to serve anything to Google that they don't serve to anyone else.

Not a surprise by faust2097 · 2005-03-23 04:04 · Score: 4, Interesting

For at least the last 18-24 months it's been increasingly difficult to find non-spam/redirect/affiliate program links for a search on any popular consumer product on Google. Maybe they have too much faith in their current PageRank and think it needs to be tweaked instead of overhauled. Maybe they think they have enough momentum and don't care. They certainly should have the talent and resources to do something about this and it's kind of sad that they haven't. I predict we'll see another whizzy side project in a few months instead.

The thing is that all they have to do is keep it just good enough that people won't leave. Remember, AdWords is Google's product, everything else [gmail, orkut, etc] they've got is just a way to show you those ads. Google's success is entirely because they had clearly better search results than anyone else. If another company can clearly best them then Google may be in trouble.

Re:Not a surprise by GoogleGuy · 2005-03-23 04:36 · Score: 5, Insightful

Hey, if you've run across spammy sites, have you filled out a spam report and used the keyword slashdot? I mentioned in a earlier comment from a different story that you can do this. We got eight reports last time, and the responses are on their way. We do check that data to look for new tricks that spammers are trying.
Re:Not a surprise by Xenna · 2005-03-23 06:18 · Score: 1

All right, I'll take you up on this. All my Dutch language queries have been polluted for the last year or so by a Dutch search engine (Kelkoo) that manages to be the first hit almost on every search.

I just filed the report w/keyword...
Re:Not a surprise by Anonymous Coward · 2005-03-23 07:03 · Score: 0

Here in Germany it's dialer sites :(, actually finding useful info (especially for school) got hard.

Wrong by PornMaster · 2005-03-23 04:09 · Score: 4, Informative

301 is a permanent redirect, 302 temporary.

This is why the "302 hack" works. If the redirect is only supposed to be temporary, the search engine keeps the URL of the 302 as the URL for the document, but indexes the content of the page to which the redirect is directed.

301 is what you should be using to point the SEs to your new pages if you've moved them. The behavior is supposed to be for the SEs to replace the old URL in their index with the new one, and furthermore count all links to the 301ed URL as being towards the new one. I don't know why it's not working for the grandparent poster, but it's the way that the functionality is "advertised" for Google and Yahoo, and it should work.

--
500GB of disk, 5TB of transfer, $5.95/mo

Re:Wrong by JaseOne · 2005-03-23 04:48 · Score: 1

Oh man... I'm not having a good track record today, I knew I should have stayed in bed. Serves me right for not looking that up first and making an assumption...

--
In the drops - An Aussie's musings on all things cycling

Bleh... by Patrick+Mannion · 2005-03-23 04:10 · Score: 4, Funny

I was thinking that some major crisis had broken out and a million pages were hijacked at once creating something bigger than any other Internet event other, and it caused Google's stock to tank and force to them go private again, lay off workers and go bankrupt. But that's crazy. But still, word it right. Damn it.

--
In America, you spam computers In Soviet Russia, computers spam you!

Stop putting links next to each other! by Anonymous Coward · 2005-03-23 04:11 · Score: 0

10.5 Until user agents (including assistive technologies) render adjacent links distinctly, include non-link, printable characters (surrounded by spaces) between adjacent links. [Priority 3]

Source: WCAG 1.0

It's even worse when you don't even put any spaces between the links. So stop it.

Oblig response by Anonymous Coward · 2005-03-23 04:11 · Score: 0

The TFA forgot

16. Profit!

Re:Oblig response by Anonymous Coward · 2005-03-23 04:52 · Score: 0

The "The fucking article" forgot???? This joke would have been almost funny if not for that.

My site is affected by barcodez · 2005-03-23 04:11 · Score: 4, Interesting

My site the humor archives has been affected by this. I can tell because if you do the following search you can see a bunch of sites that are/were 302ing to my domain. I'm pretty pissed off and I seriously hope Google act soon to rectify the matter.

--

----

Re:My site is affected by Kyrka · 2005-03-23 05:26 · Score: 1

Your site, following the search link you provided, is the TOP FREAKIN LINK dude!
You haven't been affected as I see it.
Re:My site is affected by Anonymous Coward · 2005-03-23 05:36 · Score: 0

Relying on search engines for traffic to your site isn't really advisable. I'm still amazed at how many people put up a website and magically have people flock to it without any sort of advertising. Have stellar content and people will go there by word of mouth, those are the type of people you want anyway.

When was the last time you found out about a site from a search engine, and then were like oh wow! I'm going to visit this site every day now! Usually you go there to find what you looking for, then close the window, that's it.
Re:My site is affected by northcat · 2005-03-23 06:18 · Score: 1

Nice going sherlock. Those pages show up because those pages have thehumorarchives.com in their URL. They're not 302 hijacking you. They are probably even just linking to you or showing info about your site (through some in-site redirection mechanism) because they like your site.
Re:My site is affected by GoogleGuy · 2005-03-23 06:46 · Score: 4, Informative

Yeah, this is a common misconception. allinurl: and its sister operator inurl: look for terms matching in the url. For a search like [allinurl:thehumorarchives.com], a result like www.stumbleupon.com/url/www.thehumorarchives.com/f orums/ is a fine result, and doesn't have anything to do with this.
Re:My site is affected by Anonymous Coward · 2005-03-23 14:03 · Score: 0

Re:Not a surprise (Score:5, Insightful)
by GoogleGuy (754053) * on Wednesday March 23, @11:36AM (#12024895)
(http://www.google.com/webmasters/)
Hey, if you've run across spammy sites, have you filled out a spam report and used the keyword slashdot? I mentioned in a earlier comment from a different story [slashdot.org] that you can do this. We got eight reports last time, and the responses are on their way. We do check that data to look for new tricks that spammers are trying.

From the Google "Information for Webmasters" by YouMakeMeSoANGRY · 2005-03-23 04:12 · Score: 5, Informative

Google claim...

Fiction:A competitor can ruin a site's ranking somehow or have another site removed from Google's index.
Fact:There is almost nothing a competitor can do to harm your ranking or have your site removed from our index. Your rank and your inclusion are dependent on factors under your control as a webmaster, including content choices and site design.

How about adding "Fiction: Google information for webmasters contains any facts"?

Re:From the Google "Information for Webmasters" by Anonymous Coward · 2005-03-23 05:53 · Score: 0

that would create a logical paradox
Re:From the Google "Information for Webmasters" by Anonymous Coward · 2005-03-23 08:24 · Score: 0

Almost as bad as the classic "This statement is false."
Re:From the Google "Information for Webmasters" by dascandy · 2005-03-23 08:39 · Score: 1

Google's still right. They say "almost nothing".
Re:From the Google "Information for Webmasters" by YouMakeMeSoANGRY · 2005-03-23 09:56 · Score: 1

Before "almost nothing" they say "Fiction", as in "Not True". Either people can't do anything (almost nothing == something ) or it isn't fiction.
Re:From the Google "Information for Webmasters" by starman97 · 2005-03-23 13:29 · Score: 1

Espcecially if part of "almost nothing" is a 302 redirect. Well, that's something, but it's really "almost nothing", but it's not ABSOLUTELY nothing.

--
Starman97@Gmail.com (bring it on spammers)

RTFA by Shaper_pmp · 2005-03-23 04:12 · Score: 0, Flamebait

Read the fucking article - you don't have to have any access to the victim site to do this - you only need to have a higher pagerank than them.

--
Everything in moderation, including moderation itself

Re:RTFA by Zeinfeld · 2005-03-23 04:27 · Score: 4, Insightful

Read the fucking article - you don't have to have any access to the victim site to do this - you only need to have a higher pagerank than them.
The article is confused and baddly written. It does not explain the exploit being used ever. So stop dumping on people. It is not at all surprising that people don't get what is going on when the description is crud.
What is really going on has nothing to do with 302, or at least very little. What these people are doing is to set up fake web sites using content filched from genuine Web sites. This allows (or is beleived to allow) them to climb the google rankings.
I don't see why someone would use a 302 response when they can just copy the entire content unless there is some sort of bug in Google's pagerank that is not being explained. Copying the entire content is much simpler.
So what the attacker does is to set up their site so that when the googlebot comes round it publishes some legitimate content, then when other folk follow the site from a google search they get pages infested with spyware or the like.
This would certainly explain the number of times I have done a Google search and ended up at an idiotic 'search site' that does nothing for me.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Re:RTFA by mla_anderson · 2005-03-23 05:09 · Score: 5, Informative
No, the way it works is with the 302, but only for the googlebot.
1. Googlebot goes to scammer's site
2. Googlebot is given a 302 (redirect) to the victim's site
3. Googlebot indexes the victim's site as belonging to the original URL
4. Googlebot goes to the victim's site
5. Googlebot realizes this URL is already indexed and "belongs" (according to the Google code) to the scammer.
6. The victim's site get's lower rankings as the page is not even indexed, the scammer's site gets a higher ranking.
For this to work the scammer has to give the 302 only to the googlebot, all other browsers need to get the content of the scammer's page. If you google for "cheapest car insurance" (IIRC) you can find an example of this. Change your User Agent accordingly and click on the top Google link, you'll end up at another site. Change back to Mozilla and you'll get the scammer's site.
--
Sig is on vacation
Re:RTFA by Shaper_pmp · 2005-03-23 05:18 · Score: 1

My apologies, but the details of this exploit were linked-to in a previous article as well as this one, and you can't move for explanations of how it works. I also tend to get irritable with people who, when explicitely presented with information on a subject, can't be bothered to even attempt reading it (as the GP obviously hasn't, obviously not understanding the first thing about how it works), and instead just want everyone to explain it for them (again).

What is really going on has nothing to do with 302, or at least very little. What these people are doing is to set up fake web sites using content filched from genuine Web sites. This allows (or is beleived to allow) them to climb the google rankings.

Nope. They're using a combination of 302 HTTP response headers and a bug/misfeature in Google's spidering system - they don't have to have any kind of access to the site being hijacked, and they aren't copying anything off the site. They set up a 302 redirect to the hijackee, and Google itself gets confused and attributes the hijackee's content to the hijacker.

This is all explained in the article, although since you apparently haven't understood it either I accept it might be badly-written and/or overly technical ;-)

I don't see why someone would use a 302 response when they can just copy the entire content unless there is some sort of bug in Google's pagerank that is not being explained. Copying the entire content is much simpler.

This way they aren't just hosting the same content as another site (competing for rankings and leaving themselves open to accusations of copyright violation), they're actually knocking the original site out of the Google rankings altogether, in a pretty subtle way (so it might even go unnoticed by the site owner), with very little work (esp. compared to replicating a whole page/site), and without explicitely violating any laws (that I can see).

So what the attacker does is to set up their site so that when the googlebot comes round it publishes some legitimate content, then when other folk follow the site from a google search they get pages infested with spyware or the like.

Not quite - this is common-or-garden (and long-known-about) page cloaking, which is a pain in the arse, but you can live with it. What the article is talking about is entirely different (see above).

--
Everything in moderation, including moderation itself
Re:RTFA by IMarvinTPA · 2005-03-23 05:29 · Score: 2, Interesting

This explains to me what's going on.

Although it seems backwards to me from what they should do.
What Google needs to do is not index 302s and instead index the final page. Alternatively/additionally, make sure the domain remains the same when accepting a 302 and indexing it.

As it is, it sounds like they're indexing my change of address card and ignoring my current residence.

IMarv

--
Trusting software vendors is no smarter than trus
Re:RTFA by Zeinfeld · 2005-03-23 06:00 · Score: 2, Interesting

My apologies, but the details of this exploit were linked-to in a previous article as well as this one, and you can't move for explanations of how it works.
If I find both articles confused and confusing then it is a bit much to expect other people to follow them, I am listed as an original contributor to the design of HTTP.
The real problem here is not the 302, its a bug in the googlebot. fortunately a realtively easy one to fix. When googlebot sees a 302 redirect to a page it treats the actual page and the redirect to the page as if they are one and the same. It should not, instead it should give the 302 linking URL a lower score than the URL linked to. I think this is pretty obvious from the specs. It should be a pretty quick fix.
This is one of the problems I have every week when someone comes along with a 'new' attack that is simply a slight twist on something that has been around for years. I recently got called by a journalist researching IM 'viruses', unfortunately it was only afterwards that I realized that all this 'new' attack was telling us is that once a machine is infected by spyware there is very little that can be done to protect the user.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Re:RTFA by kevjava · 2005-03-23 06:19 · Score: 1

So what happens if you change your browser to identify as Googlebot?

Problem solved, right? The scammer redirects you via 302, and you see the original content of the page that the scammer wants to index.
Re:RTFA by Anonymous Coward · 2005-03-23 16:45 · Score: 0

So, all Google would have to do to fix this problem is have their User Agent look like FireFox or some other regular browser to the scamer's web server.

Is that right?

Alarmism? by flajann · 2005-03-23 04:12 · Score: 0, Redundant

Is this really as serious as it sounds?

--
Ruby Neural Evolution of Augmenting Topologies

Could be a useful tool against domain-vultures by waynemr · 2005-03-23 04:14 · Score: 1

Domain-vultures - you know, the pr0n companies out there that harvest recently expired domains and point them to their adult content. There are lots and lots of sites like this, where some admin forgot to re-register the site, etc. And then, the domain is held hostage by the pr0n site until a ransom of a couple thousand is paid to them. Just a thought.

More information on this bug by Anonymous Coward · 2005-03-23 04:17 · Score: 0

Here is more information on this.
http://www.royans.net/blog/2004/12/googles-secert- 301302-bug.html

pure FUD the submitter is a spammer by Anonymous Coward · 2005-03-23 04:20 · Score: 4, Informative

what major headlines ? millions of pages !! the world is coming to an end !!!!

a quick whois on threadwatch.org (the submitters site) reveals its hosted by search engine spammers
platinax.co.uk which is registed to a UK "company" called BriteCorp
http://www.britecorp.co.uk/

who offer all the usual SE spamming methods
coincidence ?
a whois on britecorp's platinex site reveals they have removed their address from the whois db, and their websites contact details are a mobile phone number (07963 808470)
further investigation on britecorp reveals they are not a "real" company but trading as "Brian Turner" (pic) and companies house dont seem to have any records of any of these companies, though iam sure further investigation could find out more

so why would a supposedly reputable marketing company have a cell phone as a primary contact point ?
something to hide egh ?
or perhaps local trading standards would like to hear about them and their "services" ?

northern scum by any other name

Re:pure FUD the submitter is a spammer by Elwood+P+Dowd · 2005-03-23 05:16 · Score: 1

northern scum
Southern fairies.

--

There are no trails. There are no trees out here.
Re:pure FUD the submitter is a spammer by Anonymous Coward · 2005-03-23 05:43 · Score: 0

northern scum
Southern fairies.

English pansies
Re:pure FUD the submitter is a spammer by Afty0r · 2005-03-23 05:58 · Score: 1

so why would a supposedly reputable marketing company have a cell phone as a primary contact point ?
Perhaps the company in question is in fact one person, and he uses his mobile number as primary contact because he spends most of his time out of the office (home office perhaps) with clients.

Im am in the UK, a one-man business, involved in web development (not SEO) and I use my mobile number as a main point of contact - it doesn't mean I'm not reputable, it means I want my clients to be able to talk to me directly, instead of through a receptionist who knows nothing except how to look after her nails.
Re:pure FUD the submitter is a spammer by stor · 2005-03-23 15:54 · Score: 1

Whinging Pom

Sorry, everyone else was doing it ;)

Cheers
Stor

--
"Yeah well there's a lot of stuff that should be, but isn't"

Talk about redirect. by pg110404 · 2005-03-23 04:22 · Score: 1, Funny

I was on slashdot reading all this stuff when my browser redirected to porn sites......

Oh wait. I got bored and did a search for porn...

I guess that's different.

But what's the point? by hawk · 2005-03-23 04:23 · Score: 1

To continue having the victim's hits redirected, the redirect needs to stay in place, doesn't it?

What in the world does the hijacker gain by having google point him, only to then load the victim's page?

hawk

Re:But what's the point? by micromoog · 2005-03-23 04:28 · Score: 3, Informative

The hijacker's script watches to see who's coming. If it's googlebot, redirect. If it's an actual user, do [insidious thing].
Re:But what's the point? by northcat · 2005-03-23 06:01 · Score: 0, Troll

I think this shows that although slashdotters act like they know everything, they are in reality complete idiots. When they want to win an argument, they look up something on the internet and act like they know it all. But when something that needs thinking and is as simple as a kindergarten maths problem rears it's head, /.ers show their stupidity.

Search engines should devalue redirects by Animats · 2005-03-23 04:24 · Score: 4, Insightful

Redirects to a page should be treated as having far less PageRank value than the page itself. That will fix the problem.

It will also break many "click trackers", "portals", "directory sites", "search engine optimizers", and other annoyances, which is probably a plus for Google users. You know, those sites where you click on some phrase in Google and, three redirects later, you're at some irrelevant porno site.

Re:Search engines should devalue redirects by h4ter · 2005-03-23 05:24 · Score: 1

I'm finding that with all the fetishes out there, those porno sites I'm getting redirected to are getting less and less irrelevant to my searches.
Re:Search engines should devalue redirects by Anonymous Coward · 2005-03-23 06:57 · Score: 0

That undermines the point of having a 302 in the first place. The redirect target page is only supposed to be a temporary content holder.
Re:Search engines should devalue redirects by Enrico+Pulatzo · 2005-03-23 08:26 · Score: 1

I must disagree. There are tons of legitimate uses of 302 status codes that shouldn't pay a price for someone else's abuse.

The best solution would probably involve some extra layer that would relate the page being directed to and the page doing the redirection are owned by the same entity. However, that isn't very reasonable right now (maybe if it were somehow integrated into a future version of HTTP?) as the only way to encourage people to use that scheme would be people who've been burned by the abuse.
Re:Search engines should devalue redirects by mikecarrmikecarr · 2005-03-23 08:45 · Score: 1

It will also break many "click trackers", "portals", "directory sites", "search engine optimizers", and other annoyances, which is probably a plus for Google users. You know, those sites where you click on some phrase in Google and, three redirects later, you're at some irrelevant porno site.
It's a feature, not a bug! It's a porno random link feature... rather than the *relevant* porn that you were searching for, you get new irrelevant porn! Hurrah!

--
ID-10-T is a way of life

fraud, copyright, phishing, decency laws by r00t · 2005-03-23 04:24 · Score: 1

If the site is a clone, such as a fake bank or auction site, use the laws against fraud, phishing, copyright violation, etc.

If the site is porn and the correct site was something that might attract kids, get the site on that.

There are also some nice laws involving computer misuse. One could argue that Google had been "hacked".

An imaginative prosecuter will have many more ideas.

Re:fraud, copyright, phishing, decency laws by gl4ss · 2005-03-23 04:48 · Score: 2, Insightful

and that prosecuter has to get pretty imaginative to get jurisdiction over the people in some countries.

prosecution can't fix this problem.

--
world was created 5 seconds before this post as it is.

Doesn't seem like the end of the world by Hornsby · 2005-03-23 04:26 · Score: 2, Insightful

Why not just fix the bug and then recreate the rankings index? Googlebot hits my sites all the time, so I know that it covers the rest of the internet quite often as well. With their amount of hardware, it probably wouldn't take long.

--
A musician without the RIAA, is like a fish without a bicycle.

Not true I think by Anonymous Coward · 2005-03-23 04:27 · Score: 1, Insightful

AllinURL returns results where the results are in the URL. So they *should* be returned.

I'm not convinced by this whole 302 nonsense. I haven't seen a single search where a 302 scraper site is ranking above the site it 302s for the scammed text.

To me it sounds like people's sites drop for whatever reason, then they look for a reason and they grasp at this 302 story.

I do an allinurl on my various sites (8 of them) and 6 have scrapers attached, only 1 has disappeared recently and that seems to have been caused by a change of IP address or maybe the loss of the yahoo directory link or perhaps because I have lots of pages with 20-30% similar content.

But if I only had 1 site I could easily blame a 302 problem.

Re:Not true I think by northcat · 2005-03-23 06:21 · Score: 1

The 'attack' is possible. And just because it hasn't happened to you or you haven't seen it doesn't mean it's not happening. How many murders have you seen? And how many times have you been murdered? (Sorry for comparing this to murder)

treat redirects as one-link pages by wotevah · 2005-03-23 04:28 · Score: 2, Insightful

It seems that when page A redirects to B, Google not only considers that a hit for A, but also assigns B's content to A (I just skimmed through all the posts here so maybe that's not what happens).

In that case, it seems to make more sense to just ignore A altogether since the hit and content rightfully belong to B.

This could be done by treating redirects as empty one-link pages, thus unifying the handlers and defeating this practice.

Re:treat redirects as one-link pages by mla_anderson · 2005-03-23 05:20 · Score: 1

My understanding was that it's not the content, but the URL. A 302 is a temporary redirect, so when A redirects to B, Google keeps the B URL and remembers that it's just the same as the A URL. Then when Googlebot visits B, it realizes it already has the page indexed....it's the temporary redirect.

I think the Googlebot should refuse to follow 302's across domains.

--
Sig is on vacation
Re:treat redirects as one-link pages by Anonymous Coward · 2005-03-23 08:08 · Score: 0

Hear, hear! Seems like the most elegant way to fix this once and for all.
Re:treat redirects as one-link pages by Anonymous Coward · 2005-03-23 12:37 · Score: 0

Counting it as a page with a single link would be more effective: the victim will get an _increased_ google ranking because of all the links pointing to it. That will likely make the 302's go away quicker...

Kindly extract your head from wherever it is by ites · 2005-03-23 04:38 · Score: 5, Informative

This story does not need "debunking".

What it needs is a rapid and satisfactory answer or Google will find themselves at the receiving end of more angst than they even know is possible.

A concrete example. My company's web site has been in existence since 1995. So we have pretty good page ranking. Our main page has one phrase, very distinct, unique.

When I search for this phrase (in quotes), Google reports hundreds of matches. These sites (except our own) do not contain the phrase but are sites that sell traffic boosting.

The 302 problem is real.

Incidentally, I just spent 15 minutes at Google.com looking for a way to report the problem. Where is that mention of "canonicalpage"? In the bottom shelf of a filing cabinet, behind a locked door that says "beware of the tiger"?

I'm not surprised you got only 30 reports. What I am surprised at is that you appear to speak for Google yet have such an inane response to what is a real (and for many people, a terrifying) problem.

--
Sig for sale or rent. One previous user. Inquire within.

Re:Kindly extract your head from wherever it is by Anonymous Coward · 2005-03-23 04:55 · Score: 0

mod parent up
Re:Kindly extract your head from wherever it is by Anonymous Coward · 2005-03-23 05:08 · Score: 0, Informative

Incidentally, I just spent 15 minutes at Google.com looking for a way to report the problem. Where is that mention of "canonicalpage"? In the bottom shelf of a filing cabinet, behind a locked door that says "beware of the tiger"?
Obviously you did not google for it. Try this
Re:Kindly extract your head from wherever it is by Anonymous Coward · 2005-03-23 05:26 · Score: 1, Insightful

Obviously you did not google for it.

This is an idiotic response. Why on earth do people mod stuff like this up? Who in the hell is going to google for "canonicalpage"??? That is the solution you moron. Let me see you search for and find the solution without entering the term for the solution itself.

You are a moron, and whoever modded you up is even stupider than you.
Re:Kindly extract your head from wherever it is by Anonymous Coward · 2005-03-23 05:29 · Score: 0

ROTFL, how is someone supposed to search for a keyword they never heard of?
Re:Kindly extract your head from wherever it is by alphakappa · 2005-03-23 05:30 · Score: 2, Informative

Here's where you can file a report.

--
"When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
Re:Kindly extract your head from wherever it is by Sven+Tuerpe · 2005-03-23 05:41 · Score: 1
Our main page has one phrase, very distinct, unique.
When I search for this phrase (in quotes), Google reports hundreds of matches. These sites (except our own) do not contain the phrase but are sites that sell traffic boosting.
The 302 problem is real.

Is it? What percentage of your Web site's users will
- Search for that particular phrase, and
- Be interested just in your main page?
This is an important point many seem to be missing: It does not matter where your main page is in any generic brand name search. What does matter are the actual searches your actual visitors do in order to find actual content.
You should repeat that test with a more reasonable approach. Do you come to the same conclusion if you search for specific pieces of information elsewhere on your site?
--
http://erichsieht.wordpress.com/category/english/
Re:Kindly extract your head from wherever it is by ites · 2005-03-23 05:45 · Score: 1

I believe the point is this:

- the web site has a very unique signature phrase
- it has a very high page ranking
- it has been successfully hijacked by other sites

Now...

- take a site with more generic search terms
- with modest page ranking

And... what happens to their traffic? Basically it dies.

--
Sig for sale or rent. One previous user. Inquire within.
Re:Kindly extract your head from wherever it is by Anonymous Coward · 2005-03-23 07:04 · Score: 0

the web site has a very unique signature phrase

No, it's not "very unique" - it might be unique, but it's only a little bit unique.

Dumbass. Maybe you should learn what a word means before using it.
Re:Kindly extract your head from wherever it is by Sven+Tuerpe · 2005-03-23 07:58 · Score: 1

the web site has a very unique signature phrase ... it has been successfully hijacked by other sites

Has it? Has the site been hijacked, or has just a specific page been hijacked for specific searches, which may or may not be used by a relevant fraction of potential users? I'm serious about this question. What are the searches that matter, and how are these searches affected? I couldn't care less about the mental framework of so-called SEOs and their customers, who are obsessed with high positions in some arbitrary result page. It does not matter where the main page of a cooking site is positioned in a search for "cooking recipe", if users search for "chili con carne". The former one is what SEOs try to sell their clients. Is this hijacking still an issue if we stop listening to them?

--
http://erichsieht.wordpress.com/category/english/
Re:Kindly extract your head from wherever it is by infernalproteus · 2005-03-23 19:40 · Score: 2, Informative

"beware of the tiger"

You mean "beware of the leopard"...
Re:Kindly extract your head from wherever it is by chefren · 2005-03-24 01:06 · Score: 1

It's "beware of the leopard". Get your facts right. :)

Go Phish by MacFanMR · 2005-03-23 04:40 · Score: 2, Insightful

This has very real potential to be taken advantage of for phishing scams.

Imagine someone searching for their bank's website on Google (because some think that [searching] is how the web works!) and clicking the wrong link. That link takes them to a site that looks just like their bank's website, and maybe there is a security alert on the front page asking them to verify their information. After doing so, they could be redirected to their real bank's site, never having realized their error.

Experience has shown me that most non-techies know they type an address into their browser, but after that, they pay no attention to it which makes this a real possibility.

Re:Go Phish by northcat · 2005-03-23 05:52 · Score: 1

Yeah, this is basically what we're all saying. This and when you search for "bank" you get UnknownNon-famous(probably-scam)Bank.com as the fifth result instead of FamousBank.com.

Mod parent up. by MyLongNickName · 2005-03-23 04:48 · Score: 2, Insightful

This is hilarious! Someone please mod up! Hope I get the above mods in M2.

--
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year

No, it's not about redirecting the user... by ites · 2005-03-23 04:49 · Score: 5, Informative

It's about pushing unrelated sites up in the rankings.

For instance: I have a site with excellent page ranking. Now a new site will set up, and do a 302 to my site. Google now gives this new site my page ranking. When the new site is indexed, it removes the 302 redirection.

When you search for my site, you now find these new sites instead. There is no redirection when you click on a link, the the "cached text" that Google shows is wrong.

Basically this technique allows people to get high page rankings without earning them. It's very widespread - I counted over 60 such parasites for my company's web site (which has excellent page ranking).

--
Sig for sale or rent. One previous user. Inquire within.

Re:No, it's not about redirecting the user... by Anonymous Coward · 2005-03-23 05:27 · Score: 0

Doesn't the bot already use anonymized IPs to detect googlespamming? Can't it extend the same to a 302? If the spider can't tell the difference between a redirected page and the actual one, then it needs a little work...
Re:No, it's not about redirecting the user... by Anonymous Coward · 2005-03-23 06:20 · Score: 1, Insightful

The same thing can be done with a CNAME record. Give your domain a CNAME to www.google.com. It will eventually obtain a PageRank of 10. But this PageRank is useless for obtaining better search positions, as it will go away once the real PageRank is calculated.

This may not be a problem because the PageRank shown in the toolbar is generally not the real PageRank Google uses to determine its positions.

These techniques may be nothing more than a placebo, and there's probably a few Google employees who get a good laugh out of webmasters using such techniques.
Re:No, it's not about redirecting the user... by budgenator · 2005-03-23 06:50 · Score: 1

An Example please, your last example was ranked number one on google, so I still fail to see this as having caused harm to your site.

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Re:No, it's not about redirecting the user... by Anonymous Coward · 2005-03-23 14:41 · Score: 0

Maybe you should just buy an ad instead of trying to exploit pagerank. You shouldn't be able to "own" a pagerank. It's like walking into a clasroom and sitting in a chair and expecting that chair to be free anytime you want it.

You got an email from me! by pastepotpete · 2005-03-23 04:53 · Score: 3, Informative

And I know two other people who sent one. Maybe you should check again? I doubt me and my mates account for 10% of your responses. If you believe that the people affected by this are all "spammers" then perhaps the problem is false positives for your spam detection filters. In fact you should probably take a look at your spam detection filters anyway. Last time I checked--probably much more recently than you checked for canonicalpage emails, there was a bunch of scraper sites running AdSense where good relevant results used to be.

Re:You got an email from me! by dfjghsk · 2005-03-23 05:06 · Score: 1

yeah.. they're in damage-control mode now... apparently lying isn't against thier 'no-evil' policy.

--
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?

And how to report this to Google... by ites · 2005-03-23 04:54 · Score: 2, Interesting

Email to webmaster@google.com with the keyword "canonicalpage".

Google are not taking this problem seriously.

I'd suggest that if your website is affected, you send an email as above.

--
Sig for sale or rent. One previous user. Inquire within.

Exactly. by Anonymous Coward · 2005-03-23 04:56 · Score: 0

Too bad the parent was rated "flamebait", he/she (She? Yeah right!) has a good point. Results from Google searches have gone to the dumps because of spam, much of it sanctioned by Google. The only real difference here is that these people are not paying Google to spam you. The Google index sucks right now.

Re:Exactly. by loraksus · 2005-03-23 05:26 · Score: 3, Insightful

I sort of agreed, it was really bad about a month or two ago, but has been getting better for most of the "commonly searched" terms. Some fairly obscure searches still turn up a bit of crap, but you can't do it for everyone.
A "Don't show me any results from this subnet + domain from now on" feature would be nice, as would google banning some of the worst offenders (which it seems to have done).

--
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcfv gbhnjmk,l.;/

Why This is Such a Big Deal (A Summary) by Anonymous Coward · 2005-03-23 04:57 · Score: 5, Informative

This was originally posted the first time a story about this ran, but since a lot of people are still confused, here it is again...

There seems to be a lot of confusion as to why exactly this is such a big deal. A lot of people saying there's no problem or that this is nothing new... basically just not understanding the issue. Let me explain:

Suppose you have a small business under the domain http://xyz.com/, and search engines bring you a lot of traffic because you rank high for keywords in your market. You have a lot of people out there linking to you, a lot of satisfied customers, good content on your site. You're always in the top 10 somewhere when people search for "xyz widgets".

Well, this issue with Google makes it very easy -- incredibly easy -- for someone to knock your site out of the rankings entirely. And I mean for *everything*, to where searching for your own company name in quotes literally buries you hundreds of pages deep in the results. We're talking sites going from getting 1000 unique hits to 10 overnight.

And here's the kicker: It requires absolutely no technical knowledge, no time investment, and is perfectly legal...

All I have to do is have another domain handy that is roughly as popular as yours. And I make a "links" page, like one of those directory services, that lists your website. But instead of being a normal hyperlink, it's a CGI (or PHP or ASP or whatever) script that generates a 302 redirect to your domain... Now, these are very simple, common scripts. One-liners that you can download from cgiscripts.com and stick on your server. The original intent of these scripts is to track which links are being clicked on your site. But now they've found a new use, because when Google gets that 302, all hell breaks loose.

See, according to the HTTP spec, 302 is a *temporary* redirect, which means Google is supposed to interpret whatever content it finds at the 302 target (your site) as really belonging to the URL of the source (my site). Google is just obeying the spec strictly here, and with devestating results. Why? BECAUSE THE DUPE FILTER NOW KICKS IN! You see, Google has a "dupe filter" that says if the same exact content is found for two unique URLs, then one of the URLs is obliterated in the rankings. Because after all, searchers don't want to be finding the same content over and over. If that happens, they'll start using a different search engine. But Google, sticking strictly to the HTTP spec, doesn't know who the content really belongs to when it gets a 302.

So Google essentially flips a coin. And if it comes up tails, say bye-bye to your domain in the rankings. Your *entire* domain. Because the dupe filter isn't limited to just the page that the 302 is pointing to -- it applies across your entire domain.

These 302 "exit-link-trackers" are all over the web. They've been used by webmasters for years. But it's just recently that Google has started treating 302 this way, so it didn't have any bad effect before. But now it kills you.

The funny thing is, the solution seems pretty simple: Just stop treating 302s this way if they point to a different domain. But for whatever reason Google isn't listening. Hopefully the press that's being generated now will give them the kick in the ass that they need.

Re:Why This is Such a Big Deal (A Summary) by ralphdaugherty · 2005-03-23 06:16 · Score: 1

The funny thing is, the solution seems pretty simple: Just stop treating 302s this way if they point to a different domain. But for whatever reason Google isn't listening. Hopefully the press that's being generated now will give them the kick in the ass that they need.

Great kick in the ass post to get them started.

rd
Re:Why This is Such a Big Deal (A Summary) by mabu · 2005-03-23 10:46 · Score: 1

Excellent post btw.

It seems to me the thing to do is to ignore the 302-version of the site in lieu of the original host. This would allow Google to continue to honor the 302 spec and keep stuff from being hijacked. Then they could run some automated scripts to refresh any 302 links with their original hosts having priority, so that even sites that weren't indexed by Google, end up being legitimately indexed in favor of their 302 exploitation.

Why Google would give any attention to 302s in the first place is questionable. They don't have to follow that spec -- they determine what content they want to index and why so RFC guidelines are meaningless.

Re-re-explained by fizbin · 2005-03-23 05:03 · Score: 5, Informative

Okay, so basically this is the problem: when Google encounters a status 302 redirection (as opposed to the status 301 redirection) it then indexes the content as belonging to the initial URL, not the URL at the end result of the 302 redirection. Other things happen later because of google's design.

302 redirections are temporary redirections - the idea is that a 302 is supposed to be used when someone needs to be redirected to a new page, but should still use the original URL if they want to come back later. As an example, the page http://purl.oclc.org/OCLC/PURL/CONTRIBUTORS performs a 302 redirect to http://purl.oclc.org/docs/contributors.html. This means that although your web browser needs to go to some other URL for the content at the moment, they really should remember the first url as the permanent one.

Contrast this with what happens when your browser visits http://snowplow.org/martin - you get sent a 301 redirect to http://snowplow.org/martin/. (Note the extra slash) In this case, the server is saying "the url with the slash on the end is the real location, and you should not try to come back here without the final slash in the future."

Ideally, if every web browser behaved according to spec., bookmarks (remember bookmarks?) would get automatically updated to the new URL when you selected them and the redirect was a 301 redirect. However, for a 302 redirect, the bookmark would stay as is.

302 redirects can be very useful when you want to set up a hierarchy of "logical" URLs that will permanently point to the correct location. 301 redirects are useful when you're obsoleting an old URL and wish people to go and use the new URL from now on.

Okay, so how does this relate to google? Well, let's suppose that you have a great site on fruitbats. I can set up http://www.example.com/topics/fruitbats to be a 302-style redirect to your site, essentially saying "The information at http://www.example.com/topics/fruitbats is temporarily being hosted by http://www.yoursite.com/". Now, google when it spiders pages will see that, will go retrieve the text from your page and will then index it under http://www.example.com/topics/fruitbat, since after all I just gave a temporary (302) redirect.

But it gets worse, because a final part of google's indexing process is to compare pages for identical text, and throw out all but one of the URLs. Apparently this stage has nothing to go on other than the text and the recorded URLs, and so your URL stands a fifty-fifty chance of being thrown out.

Except that I've not just redirected http://www.example.com/topics/fruitbats to your site, but also http://www.example.com/topics/fruitbat, http://www.example.com/topics/fruit_bat, and http://www.example.com/topics/fruit_bats. Now your lone URL doesn't stand much of a chance of being the one kept by the "throw out duplicates" processor, does it?

In a sense, of course, there's little google can do to prevent this, because even if they weighted 302-redirects lower in their "throw out duplicates" stage, I could always just go snag a copy of your website each time googlebot visits, in essence doing the redirection myself. (How? Just search the apache mod_rewrite guide for "Dynamic Mirror") However, doing it through 302 redircts means that google pays for the bandwidth to go get your page, not me. (Not that this is necessarily a signficant amount of bandwidth, since we're only talking about basic google here and not images. Depending on the revenue you get by misdirecting google queries it might be economical)

Of course, for this to really work, I'd need a list of websites sorted by category to build up my redirect db. But wait! The ODP feed provides exactly that.

I am a little bit wary of doi

Re:Re-re-explained by Hard_Code · 2005-03-23 05:54 · Score: 1

It seems to me that some important site still has to reference YOUR site to have it indexed in the first place, and furthermore, the reference has to have something to do with the topic you are trying to hijack.

For instance, if you were trying to hijack a fruitbat sight, you might need to convince some fruitbat webring, or zoology encyclopedia to reference your site, at which point you can start the hijacking. Otherwise your doppleganger site is going to get low pageranks (afferent links will be low value).

So... who cares if I "hijack" CNN.com, unless somebody really important links to me as CNN.

--

It's 10 PM. Do you know if you're un-American?
Re:Re-re-explained by cgreuter · 2005-03-23 06:05 · Score: 1

That was an excellent, well-written explanation of the problem, the best I've seen in this whole fracas.
...but I wonder if it would be possible to construe an unauthorized 302 redirect to a site as some sort of tort or trespass.
Even if that were the case, there are always going to be jurisdictional issues. Besides, it's pretty easy to fix with technology.
All Google (and the other search engines) need to do is have their bots honour some special meta tag (e.g. "redirects-from") which lists the URLs from which you've legitimately redirected. When a spider follows a redirect that isn't listed in the target page's redirects-from tag, it simply doesn't index the page.
Then, all the victim of a hijacking needs to do is put in the meta tag and their problem's solved.
Re:Re-re-explained by Anonymous Coward · 2005-03-23 06:07 · Score: 0

I think the point is you get your pagerank from being a associated with CNN. You simply need a seed to get you into the googlebot database but once they see the 302 linking to CNN you are suddenly lifted to CNN pagerank status. So the original refference can come from anywhere but once the index is done you are suddenly "Commander free viagra working under the supervision of CNN and should be treated as so." (note the high rank).

This can be hit hard for specific keyword rank battles. Pick your competitors, anonymously hijack and cloak malicious info in order to knock them off the rankings. Protect your legitimate site from the same attack and who is to say what really happened? ...pauses for a minute... ...clicks Anon button... ...goes back to work...
Re:Re-re-explained by Anonymous Coward · 2005-03-23 06:15 · Score: 0

Thick plot. If the 302 has no control of what content is produced then it can't very well put malicious data on there.

So if you establish a page on "evil-domain.com" to "cnn.com" and then trash evil-domain with other obviously naughty seo pages would that lead to bringing down cnn.com?
Re:Re-re-explained by umdenken · 2005-03-23 07:10 · Score: 1

That was a freaking awesome explanation. Thanks.
Re:Re-re-explained by VoidWraith · 2005-03-23 08:32 · Score: 1

It produces different content for the googlebot than for other visitors. It'll show google the page that looks like CNN and show the user the page that has all the stuff they're selling.
Re:Re-re-explained by yulek · 2005-03-23 08:44 · Score: 1

Then, all the victim of a hijacking needs to do is put in the meta tag and their problem's solved.

oh, is that all? lets have the several billion webpages out there add the extra meta. might work for dynamically generated websites, but what about stuff that's static or stuff where the owners don't have access to the header (like say... a popular blogger site?)

google needs to fix this, not The Web.

--
in this age of communication i'm just not getting through
Re:Re-re-explained by cgreuter · 2005-03-23 11:37 · Score: 1

google needs to fix this, not The Web.
Unfortunately, there's no way for software to distinguish between legitimate redirects and 302 hijackings.
If, for example, I use redirects to distribute traffic between multiple servers on multiple hosts, the GoogleBot's behaviour of treating the redirecting host as the website's canonical host is correct. I want users to use the referring host so that I can change physical hosts with impunity.
But if some scumbag is doing a 302 hijack, it looks like exactly the same thing as far as the bot is concerned. There's no way that a spider can tell whether the referrals are legitimate or not. And the only thing that can be done about it is to let webmasters explicitly tell the bot what they're trying to do. That, at least, gives website operators the means to combat this sort of abuse.
It's not Google that's broken--it's the web. It's just that the two-legged weasels are only now starting to pry open the cracks.
Re:Re-re-explained by yulek · 2005-03-23 13:10 · Score: 2, Insightful

If, for example, I use redirects to distribute traffic between multiple servers on multiple hosts, the GoogleBot's behaviour of treating the redirecting host as the website's canonical host is correct. I want users to use the referring host so that I can change physical hosts with impunity.

well, a bunch of people have suggested that 302s should only be honored by crawlers if the domain is the same. i think that's a pretty good idea.

It's not Google that's broken--it's the web. It's just that the two-legged weasels are only now starting to pry open the cracks.

why do you say that? how is the web broken because of the way google crawls it? the http standard was designed before googlebots were crawling it. long long before. the googlebot need to be more intelligent is all.

--
in this age of communication i'm just not getting through
Re:Re-re-explained by anthony_dipierro · 2005-03-23 14:01 · Score: 2, Insightful

In a sense, of course, there's little google can do to prevent this, because even if they weighted 302-redirects lower in their "throw out duplicates" stage, I could always just go snag a copy of your website each time googlebot visits, in essence doing the redirection myself.

However, doing it through 302 redircts means that google pays for the bandwidth to go get your page, not me.

Ah, but doing it through a 302 also means that the target site can't notice you making regular hits to it and block your IP address.

There's also perhaps a legal distinction. Actively copying someone else's site without permission is pretty clearly copyright infringement. Just 302ing to it most likely isn't.
Re:Re-re-explained by WNight · 2005-03-23 18:54 · Score: 1

No, just cloak themselves. Different UAs and different IPs.

Ideally, google could even randomly check some URLs more than once, with different UAs. This would catch some tricks, but moreso, would encourage people to not use browser-detection code. (Some change is one thing, but if the highly ranked content can't be found - tough.)

That solves the webmasters serving something different to Google problem.

After that, they just need to apply all rankings to the page with the content. Treat all redirects like permanent ones. It's okay that you point somewhere, but because it's hard to prove ownership, you can't get credit for what's there.

Otherwise, we'd need a new response code - 25x somewhere - "OK, but temporarily redirected from ..." which means, don't bookmark/index me, bookmark the source. As this requires write access to the page or the server, it couldn't be remotely abused.
Re:Re-re-explained by cgreuter · 2005-03-24 05:36 · Score: 1

well, a bunch of people have suggested that 302s should only be honored by crawlers if the domain is the same.
The problem with that is that there are legitimate cases where 302s need to go across domains. For example, I own a domain name and use a commercial DNS provider to forward requests to my current ISP's webspace. If I change ISPs, that will change so I want Google to treat my domain name as the canonical name of my website.
Alternately, someone using temporary hosting to handle a sudden surge of bandwidth might use 302s in the short term but would want Google to continue to associate the pages with the forwarding domain.
Of course, if you amend the suggestion to "honour 302s if the domain is the same or if the destination site has metatags naming the originator as part of the site", you end up with my original suggestion.
why do you say that? how is the web broken because of the way google crawls it?
It's not about Google. (Or, it is but only because Google is the big name in searching. When someone else replaces them as the search leader, it'll become about them instead.)
Google's search algorithm isn't so much an invention as a discovery. Long before Google existed, the world-wide web contained within it a way to reliably index most of its contents--the network of links. Google's founders simply discovered it and figured out how to mine that data. It's the Internet equivalent of a natural resource.
(Which is not to say that Google isn't brilliant or a huge achievement.)
And as usual, as soon as someone with a financial interest comes along, they start despoiling it. Email existed and was perfectly usable for over fifteen years before the spammers got into it and only then did people realize that maybe, there should have been a better authentication system built into it. Ditto for USENET and, more recently, blog comments.
Well, now it's the web's turn. The days where search engines "just worked" are over. We, who operate websites, have to start pitching in to protect the structure. It's already started with the "nofollow" attribute and it'll have to continue or the web will become unsearchable.

Re:Why? (now iTMS) by notthepainter · 2005-03-23 05:03 · Score: 1

I'm so thinking of the recent iTMS DRM hack...

it cant stop by grungefade · 2005-03-23 05:09 · Score: 1, Interesting

This has been possible in php forever. And a lot more hidden than a 302 redirect. You go to one page and depending on where you came from, it shows different content, but the url stays exactly the same. Here, go fool google:

$referrer = $_SERVER['HTTP_REFERER'];

$findme = 'google';

$from_google = strpos($referrer, $findme);

if ($from_google === FALSE){

echo "Original Content";

}

else{

//Content people see that come from google

$content = file_get_contents("http://www.yahoo.com");

echo $content;

die();

}

?>

Googlebot wont see yahoo.com content because it dosent have referrer of google. Or you could do the same thing to googlebot. Get the ip of googlebot and show it different information than whats there.

Re:it cant stop by Anonymous Coward · 2005-03-23 17:34 · Score: 0

$referrer = $_SERVER['HTTP_REFERER']; $findme = 'google'; $from_google = strpos($referrer, $findme); if ($from_google === FALSE){ echo "Original Content"; }

Yikes! Unless you're getting paid per KLOC, try this instead:
if(!strpos($_SERVER['HTTP_REFERER'], 'google')){ echo "Original Content"; }
It does the same thing.

This isn't a nitpick about cutting down your lines of code, it's about efficiency. The original code is wasting a lot of clock cycles to create those three variables and assign values to them.

Question - Offtopic by Anonymous Coward · 2005-03-23 05:10 · Score: 0

Hey when are you guys going to make it possible to by music, movies, and tv shows from google? I hearcd rumors to this effect, that you'll have "buy this song" feature. Are they true? You know, you search for a song or tv show and then make it available via a third party or google or something.

That would be cool. I wouldnt mind an interface to sell my songs via google.

He's answered this before by Anonymous Coward · 2005-03-23 05:13 · Score: 1, Interesting

He stated that he is an engineer employed by Google. He surfs Slashdot and to some extent speaks for Google, although his membership is paid for out of his own pocket. Someone else, not affiliated with Google, had the user ID before but agreed to transfer it to this guy.

At least, that's what he has said before.

Why Is This Not Fixed? by fire-eyes · 2005-03-23 05:21 · Score: 1

I guess what I haven't seen asked yet is:

Why is this not fixed yet?

C'mon Google.

--
-- Note: If you don't agree with me, don't bother replying. I won't read it.

Re:Why Is This Not Fixed? by lahvak · 2005-03-23 06:42 · Score: 1

And how exactly do you propose to fix it?

--
AccountKiller

Perhaps you can give an example? by Anonymous Coward · 2005-03-23 05:22 · Score: 0

Your story would be a lot more convincing if you listed a real site that had been "highjacked" by this technique. Or are you just a spammer?

How convinient by Anonymous Coward · 2005-03-23 05:23 · Score: 0

So you're a Google employee who debunks "myths" about Google but when you say something you don't officially represent Google's opinion and stance on issues, right? How convinient.

Doesn't effect Yahoo by X · 2005-03-23 05:30 · Score: 4, Interesting

I'm surprised nobody has mentioned that Yahoo has already closed the 302 hole.

--
sigs are a waste of space

Re:Doesn't effect Yahoo by Anonymous Coward · 2005-03-23 06:20 · Score: 1, Funny

Funny, since I just read it in a post way 'above' yours...
Re:Doesn't effect Yahoo by parcifal · 2005-03-23 06:51 · Score: 1

That is what I was wondering. Aren't the other search engines also affected? Maybe Google should take a page out of the other engines and fix this!
Re:Doesn't effect Yahoo by X · 2005-03-23 07:02 · Score: 1

Fair enough then. I'll restate: I'm surprised that no comment about Yahoo being immune was modded up high enough for me to see it. ;-)

--
sigs are a waste of space
Re:Doesn't effect Yahoo by Lord+Omlette · 2005-03-24 07:12 · Score: 1

Did Yahoo fix it? Or did they say why they're not affected?

--
[o]_O

OK, I'll bite ... by isometrick · 2005-03-23 05:33 · Score: 3, Insightful

Look, there *was* circumstancial evidence for the "Greg Duffy" thing ... i.e. just enough to make it a discussion. I agree that fearmongering is not the way to go. I appreciate that you looked into the issue (and my first instinct is to trust your explanation, that is was a DNS issue).

However, if this is Google's PR method, I think you are kind of asking for it! In the absence of information, the internet community will speculate until the cows come home. I'm not saying it's right, I'm just saying that's reality. Even though I said on my site that I thought Google didn't do anything underhanded I bet a lot of people were still not convinced. Google can do a little better than this, and although you have been fairly nice to me (thanks) this response is a little flamebaity for PR. Please understand that I mean no offense, it's just constructive criticism. Even if everything you say is true, a representative of the company should always at least attempt to sugar coat something like your last paragraph.

Also, on a more personal note, maybe Google should embrace the people that are involved in researching these problems instead of using this broken communications policy. I know that in my case I contacted you guys 5 *months* ago about the Google Print problem I described and never got any followup except for my t-shirt (which I really like). I have some great ideas about possible solutions to the problem I described, and as far as I can see Google has not fixed the root of the problem. When are you guys going to contact me?

-Greg Duffy

OK, an example by ites · 2005-03-23 05:34 · Score: 2, Informative

My company's web site is imatix.com

You will notice that the site's main page contains very little text. There is one marketroid phrase, "Strategic solutions for a complex world".

Now search Google for this phrase.

Look at the results. A completely irrelevant site has come in at first place. imatix.com is now at second place (this changed today).

imatix.com is an old site, with very high page rank. Now, it does not matter much for us, since no-one is going to search for this phrase, but if this can hit imatix.com, it can hit other sites.

The problem is entirely real, and it is extremely serious. I'd say, if Google don't fix this before it hits the main media, they will suffer irreparable damage to their reputation.

--
Sig for sale or rent. One previous user. Inquire within.

Re:OK, an example by Elwood+P+Dowd · 2005-03-23 05:58 · Score: 1

Yes
Yes
Yes
Yes
I'd say, if Google don't fix this before it hits the main media, they will suffer irreparable damage to their reputation.
Exaggeration.

Also, looking at the results for allinurl:imatix.com and searching google for that phrase, I don't see what the big deal is. Yes, spammers appear, but your page is consistently ranked #1. Maybe GoogleBoy saw your comment and had the engineers immediately fix your problem on every cluster, but somehow I doubt it. Same thing for that other page you said got 302ed. tsomething.it.

--

There are no trails. There are no trees out here.
Re:OK, an example by That's+Unpossible! · 2005-03-23 06:02 · Score: 2, Insightful

The problem you are describing here is not a 302 hijacking. Those sites don't do any redirecting, and they aren't duplicating your site page causing you to be bumped out of the loop. They just happen to have a link to your site and your "motto" on their page. The fact their page comes up before yours does seem stupid, but is unrelated from the 302 hijacking issue.

--
Ironically, the word ironically is often used incorrectly.
Re:OK, an example by tomhudson · 2005-03-23 06:29 · Score: 1

It's not like the words used aren't mostly candidates for "buzzword bingo", so you're actually lucky that google still puts it on their first page, never mind as #1.
Not to be too pissy, but I don't see what your beef with google is, as long as you get on the first page of results.
Re:OK, an example by GoogleGuy · 2005-03-23 07:09 · Score: 4, Informative

Thanks for the concrete example. As someone else pointed out:
- for the search imatix I see you at number one.
- for the search "Strategic solutions for a complex world" I see you at number one.
- for the search allinurl:imatix.com, that search (and it's sister operator inurl:) only look for the words in the url. So it's perfectly fine to show results like "real-imatix.com/" because they contain the word imatix. These results are not hijacking results--this is expected behavior for inurl and allinurl.

Hope this helps,
GoogleGuy
Re:OK, an example by GoogleGuy · 2005-03-23 07:31 · Score: 3, Informative

Just to follow-up, I saw your email come through the queue from user support. The engineer who checked it out basically said "They appear at the top of the results when I do a search. Still, just because their website only has that one phrase on it doesn't guarantee that their site will appear at the top of the results." So this isn't a "302 hijacking," but I hope our user support will reply in addition to my post. :)
Re:OK, an example by igny · 2005-03-23 09:29 · Score: 1

But for the search "Strategic solutions for a complex world" *I* see some web hosting site as #1. Could you explain the discrepancy? Is it possible that the Google results differ from time to time or from location to location?

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Re:OK, an example by Anonymous Coward · 2005-03-23 09:39 · Score: 0

http://www.imatix.com/ is what I see as number one.
Re:OK, an example by GoogleGuy · 2005-03-23 10:10 · Score: 1

Different folks often hit different data centers because of load balancing and stuff like that. I'll certainly keep an eye on this search myself too though.
Re:OK, an example by Anonymous Coward · 2005-03-23 10:37 · Score: 0

Here's what I see: Free Ftp Web Server - Live Web Cams People Backgrounds Powerpoint ...
... Strategic Solutions for a Complex World. FTPplanet.com - A community site for
users of FTP. Visit FTPplanet.com for everything FTP related. ...
www.hostelshop.com/ description-8-39-Free_Ftp_Web_Server.html - 28k - 21 Mar 2005
Re:OK, an example by glesga_kiss · 2005-03-23 12:10 · Score: 1

I just pulled the following from .co.uk using the phrase in quotes:
Free Ftp Web Server - Live Web Cams People Backgrounds Powerpoint ... ... Strategic Solutions for a Complex World. FTPplanet.com - A community site for users of FTP. Visit FTPplanet.com for everything FTP related. ... www.hostelshop.com/ description-8-39-Free_Ftp_Web_Server.html - 28k - Cached - Similar pages

Second link is the imatrix one. Same on .com. First link goes here.
Re:OK, an example by SacredNaCl · 2005-03-23 13:56 · Score: 1

Thanks for the concrete example. As someone else pointed out:
- for the search imatix I see you at number one.

Look again, his site isn't the one with the nice Favicon, his site is the 2nd result, the one without it that actually has "Stategic Solutions For A Complex World!" on the front page. Even you, GoogleGuy, were fooled by the first result. So if you were fooled having intimate knowledge of the problem, how would someone without intimate knowledge of the problem fair?

--
Freedom is merely privilege extended unless enjoyed by one and all.
Re:OK, an example by RedWizzard · 2005-03-23 14:35 · Score: 1

Look again, his site isn't the one with the nice Favicon, his site is the 2nd result, the one without it that actually has "Stategic Solutions For A Complex World!" on the front page. Even you, GoogleGuy, were fooled by the first result.
Following is what I'm seeing at the top of the results. Are you seeing the same?
iMatix Corporation Strategic Solutions for a Complex World. www.imatix.com/ - 5k - Cached - Similar pages Xitami Portable free web server, distributed with source code according to a liberal license agreement. Online documentation on how to install, use, and configure. www.imatix.com/html/xitami/ - Similar pages [ More results from www.imatix.com ] Welcome To Xitami.com ... Xitami.com is run by iMatix Corporation, the company that brought you Xitami. ... the iMatix GSL scripting language, built-in to the web server engine. ... www.xitami.com/ - 6k - 21 Mar 2005 - Cached - Similar pages
When I do the search for "Strategic solutions for a complex world" his site comes up first. And the next 3 or 4 all link to his site with that phrase. I don't see the problem.
Re:OK, an example by emurphy42 · 2005-03-23 17:36 · Score: 1

I tried it just now, also using .co.uk, and imatix.com was first (hostelshop.com was second). Same on .com.

Simple Answer by rabtech · 2005-03-23 05:38 · Score: 4, Insightful

There is a simple solution for Google: Only honor 302 redirects when the original and target domains match (or points to a subdomain of the original domain.)

In all other cases treat a 302 (temporary) as a 301 (permanent) redirect, thus giving credit for the content to the actual hoster of the content.

This allows webmasters to continue using 302s to setup logical URLs to mask the organization of underlying content but eliminates the ability to hijack completely.

--
Natural != (nontoxic || beneficial)

Re:Simple Answer by RedWizzard · 2005-03-23 14:19 · Score: 1

But then all the hijackers need to do is to grab the target page dynamically and feed it to GoogleBot directly when it comes calling (essentially a virtual redirect). The effect will be the same.

Can anybody provide a working example? by turnstyle · 2005-03-23 05:44 · Score: 2, Interesting

Is there a specific search that someone can suggest that would demonstrate this problem?

--
Here's what I do: Bitty Browser & Andromeda

google is number one, so yea.. by joeldg · 2005-03-23 05:49 · Score: 1

everyone knows google is #1
being at the top makes you a target and every little gnat is going to chew at you trying to get a piece.

remember altavista and others..
they ended up so spammed you had to go through pages of results to find anything any good.

I just think it has taken a while, but they are catching up with google now.

--
anime+manga together at last.. in real time.

Hrrm, has this page been stolen? by Lars+T. · 2005-03-23 06:04 · Score: 1

Taco posts the same URL.

--

Lars T.

To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

this is a big deal... by Anonymous Coward · 2005-03-23 06:07 · Score: 0

I just did a search for allinurl:mysite and found about.com was stealing some content. Google was indeed pointing readers to about.com.

We're talking more than 30 pages here.

Teoma is also vunerable to this thing by nbkolchin · 2005-03-23 06:10 · Score: 1

Try http://s.teoma.com/search?q=See%2C+according+to+th e+HTTP+spec%2C+302+is+a+*temporary*+redirect%2C+wh ich+means+Google+is+supposed+to+interpret+whatever +content+it+finds+at+the+302+target+%28your+site%2 9+as+really+belonging+to+the+URL+of+the+source+%28 my+site%29.+Google+is+just+obeying+the+spec+strict ly+here%2C+and+with+devestating+results.+Why%3F+BE CAUSE+THE+DUPE+FILTER+NOW+KICKS+IN%21+You+see%2C+G oogle+has+a+%22dupe+filter%22+that+says+if+the+sam e+exact+content+is+found+for+two+unique+URLs%2C+th en+one+of+the+URLs+is+obliterated+in+the+rankings. +Because+after+all%2C+searchers+don%27t+want+to+be +finding+the+same+content+over+and+over.+If+that+h appens%2C+they%27ll+start+using+a+different+search +engine.+But+Google%2C+sticking+strictly+to+the+HT TP+spec%2C+doesn%27t+know+who+the+content+really+b elongs+to+when+it+gets+a+302.%0D%0A&qcat=1&qsrc=0& Search.x=0&Search.y=0

Your wacked by budgenator · 2005-03-23 06:18 · Score: 1

your sir are an obvious troll, the first link points to slashdot.org/imatrix.com which of course returns a slashdot 404 error page, and the google.com search link returns the imatrix.com websites link rated number one, and a bunch of placeholder sites below it so how does this demonstrate any harm in imatrix.com's page ranking?
come on Mods, at least read the post, and Check the links before you mod up something as informative

--
Apocalypse Cancelled, Sorry, No Ticket Refunds

jurisdiction by r00t · 2005-03-23 06:23 · Score: 1

When has jurisdiction ever stopped the USA?

We just grabbed a guy out of Australia who'd
never set foot on US soil, unless you count
Australia as US soil.

Re:jurisdiction by strider44 · 2005-03-23 13:29 · Score: 1

we do...
Re:jurisdiction by gl4ss · 2005-03-23 14:44 · Score: 1

well. for one you need to know the guys name before you can start asking for that - and after that the nation would have to want to be friendly with usa.

--
world was created 5 seconds before this post as it is.

Definition: "Almost" is a negative by billstewart · 2005-03-23 06:24 · Score: 1

"Almost Nothing" means "Not Nothing", aka "Ok, yeah, a couple of things". In this case, there's at least one technique that works, and there may be others that nobody's discovered or ranted about yet. But this one's ugly.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

clsc.net seems to be down... by luap2000 · 2005-03-23 06:25 · Score: 4, Interesting

here's my write-up on the problem from early February called Google and the Mysterious Case of the 1969 Pagejackers. the problem has been around for a long, long time.

personally, i'm ready to give up google maps or something else (autolink?) if they would 'fix' this or at least be more transparent about what's going on. ;)

btw, the word on the net is that the googleguy posting here isn't the real one. anybody have details on this?

-kpaul

--
J-Log: Journalism News, Media Views

Re:clsc.net seems to be down... by GoogleGuy · 2005-03-23 09:05 · Score: 2, Informative

It's me. I've had the GoogleGuy handle since Jan 19th, 2005. From the K5 article, the allinurl: stuff isn't true though; allinurl: just looks for term in the url. So [allinurl:imatix.com] can show results from any site that has imatix in the url.
Re:clsc.net seems to be down... by luap2000 · 2005-03-23 09:32 · Score: 1

not sure what part you're referring to exactly. if you do an allinurl:yourdomain.com and someotherdomain.com/whatever?whatever.com comes up with your exact title and description, isn't that what page jacking is?

another place i mentioned allinurl is here, but i think that's right too.

or are you talking about one of the pages i link to from the story?

lemme know (email, whatever...) you think is wrong and i'll see if i can't get it corrected.

thanks again,
kpaul

--
J-Log: Journalism News, Media Views
Re:clsc.net seems to be down... by GoogleGuy · 2005-03-23 10:18 · Score: 1

allinurl:foo.com says "show me all the results you know of that have foo.com in the url." And since com is a stopword, I wouldn't be surprised if this really just said "show me all the results with foo in the url"--that is, without the .com. You could force the .com to also match by using allinurl:foo-com to make it a phrase match, I believe.

So bar.com/dir1/dir2/foo.com would be a valid result for that search, for example. But that doesn't mean that we've confounded bar.com with foo.com. bar.com may do a 302 to foo.com or it may not, but it's not a hijacking. We're just showing all the results we know of with "foo.com" in the url; the fact that some of those results are not on foo.com isn't really a problem. Now if you did site:foo.com and saw results from bar.com, that's something that I would email to us.
Re:clsc.net seems to be down... by clsc · 2005-03-23 10:36 · Score: 1

Was about to write this, but i'll gladly repeat it in stead:
Now if you did site:foo.com and saw results from bar.com, that's something that I would email to us.
(emphasis mine)
Re:clsc.net seems to be down... by luap2000 · 2005-03-23 10:58 · Score: 1

thanks again.

it was just odd to see another URL with my title and description (and an old cache). allinurl: was the only way i saw it, though.

what explains another URL having my title and description and cache?

this, i thought, was what being pagejacked meant - a different URL with your exact title, description and a cache of your site.

i think mostly it's the other URL (not my domain) having a cache of my page is what weirds me out.

i'll email again, though.

-kpaul

--
J-Log: Journalism News, Media Views
Re:clsc.net seems to be down... by clsc · 2005-03-23 11:10 · Score: 1

>> what explains another URL having my title and description and cache?

It could be what you think it is, and it could be something else. Try the "site:example.com" search in stead of "in url" - and if you find the other domain again, run that URL through a server header checker. If it returns a 200, disable javascript and see if it has a meta redirect on the page.

Solution. by northcat · 2005-03-23 06:30 · Score: 1

I was about to say that the solution was to not replace pages in the index when the redirection happens across domains. But then public hosting sites like angelfire.com came to my mind. So they just have stop replacing redirects altogether.

Re:Solution. by northcat · 2005-03-23 08:47 · Score: 1

On second thought, 302 requires configuration of the server, so Agelfire.com users won't be affected. There might be other sites where individual users do get to do stuff like this, but then if an individual can configure the server, then the users are probably close enough to resolve things like this. But some site *migth* provide only 302 redirection to all users (not complete server config). But still they won't be able to selectively give different pages to googlebot. Man, this is a difficult problem :/ Removing this 'feature' from Google might be the safest option.

confederate Googlebots? by Anonymous Coward · 2005-03-23 06:45 · Score: 0

I'm in a hurry, so can't write much, but:

Couldn't Google do something about this by sending out Googlebots that aren't identified as such, or by requesting the site in another way, and comparing the page served to Googlebot with the page served to the alternate client?

Re:confederate Googlebots? by RuneB · 2005-03-23 06:50 · Score: 1

The evil attacker would just then load up all the IP address blocks owned by Google into the redirect script and use that instead of the User-Agent.

--
dtach - A tiny program that emulates the detach feat

BIG BIG problem by Anonymous Coward · 2005-03-23 06:55 · Score: 0

I call it Search Spam. Just as it seems email spam is getting under control stuff like this is getting out of control. I wish there was actual legal punishment for people who do this. Right now it's like a veiled threat. My web searches have layer upon layer of bad sites that have nothing to do with what I concisely searched for. Someone do something!

google != marketing by Anonymous Coward · 2005-03-23 07:13 · Score: 0

I'm sure you'll tell me I'm borked to no end, but ...

Let me remind you that it is in no way google's responsibility to drive your web traffic. Google is not your marketing department. That being said - google can do whatever the hell it wants.

It is completely your own problem if the only way you get contact is through people clicking a google link.

How is it reasonable if it's temporary? by bonch · 2005-03-23 07:16 · Score: 1

If a 302 is intended as a temporary redirect, shouldn't the Googlebot not replace the original index, considering a 302 redirect is supposed to be temporary? Why would anyone ever want what is supposed to be a temporary to be permanently indexed? Google should never have been indexing "temporary" redirects in the first place.

Re:How is it reasonable if it's temporary? by arkanes · 2005-03-23 07:45 · Score: 1

Google is doing exactly what a 302 means - it's "assigning" the content at the URL in the Location header to the original URL. 302s are heavily misused on the web today as a means of "moving" people to a different site. They're intended to be used as a way of temporarily redirecting content instead. For example, if your site was slashdotted you might serve a 302 pointing to a more-robust mirror. The common usage of a redirect as a way of "moving" people is what links are for. Googles treatment of the 302 may be naive and impractical as a matter of reality, but it's the correct behavior according to the HTTP standard.
Re:How is it reasonable if it's temporary? by Anonymous Coward · 2005-03-23 07:50 · Score: 0

That is exactly what is happening - the "temporary" redirect (i.e. the victim URL) is indexed under what is seen as the "permanent" location (i.e. the scammer's url).

The scammer's url - by returning a 302 - is saying that temporarily the content for this site is at victim's url. Googlebot then (correctly?) sees victim's url as a temporary url, and indexes it under what is seen as the permanent url, i.e. the url returning the 302.

The google logic makes sense - although i'd probably restrict it to one domain to prevent hijacking.
Re:How is it reasonable if it's temporary? by bonch · 2005-03-23 07:51 · Score: 1

It just seems to me that one would disregard indexing what is defined as a "temporary" redirect. As you state, 302s should be used for things like mirrors. I would want Google to always index the original, not a temporary mirror.
Re:How is it reasonable if it's temporary? by HardJeans · 2005-03-23 16:43 · Score: 0

And that is exactly what is happening. Google is indexing what it thinks is the original(scammer), and discarding the temporary(victim).

Google visits scammer.com

scammer.com says "hey, I'm really the original URL, but somethings wrong, and I want you to use my mirror"

Google goes to victim.com downloads cache of victim.com, and says "OK, victim.com is the TEMPORARY URL, so I'll just cache it under scammer.com, and delete victim.com from my index"

now I search for a term, and discover the info from victim.com, I click the link, and it goes to scammer.com. Now it knows I'm not google, so it doesn't throw my browser a 302 to victim.com.

--
"I'm not talking to myself, I'm just the only one who's listening." - Jimmies Chicken Shack

I don't get it... by jafiwam · 2005-03-23 07:19 · Score: 2, Insightful

Why all the yammering and discussion on this?

It's pretty simple; 302 redirects allow bad guys to exploit Google.

It doesn't matter that it's the wrong way to use a 302 redirect. They are the BAD GUYS. Remember the "spammers lie" truism?

It's the Google rule that is broken. 302 should be treated as "cant find site" in their search rankings rather than assuming the the data sent by the web server is honest. It sucks that some legit users of 302 won't get ranked as well because of it, but boo hoo. Let anybody that has hardware or software problems get better equipment in the first place if their freaking world ends when they don't get ranked in their keyword group. I have NO SYMPATHY for someone that shoestrings their vital revenue stream infrastructure and then wonders why things go bad. It reminds me of my job too much.

Buy Google ADs if you need to make money off your site traffic.

Google will change the rule or they won't. If they want to stay relevant, they'd better. I find myself getting irritated with Google's crappy search results a lot now days, sooner or later I will find one of the little startup to use and they can kiss off if it keeps up. So I figure they will get to it. They are Google, they are good at what they do.

Now what I think they should do is download snippets of pages via the Google toolbar which then sends the data to Google to make a massively distributed bot-net spider that is indistinquishable from the web-using masses. At that point, as far as exploiting Google via IP of the bot or user agent of the bot IT IS ALL OVER.

Move along, nothing to see here but a bunch of people that don't understand redirect and HTTP protocols.

HTTP 301 filter by accidentalGeek · 2005-03-23 07:35 · Score: 1

I have a solution in mind that may or may not work depending on robot/indexer behavior.

What would happen if I installed a filter on my Web server (in pseudocode):
if (!(http.referrer matches(my.domain))
{
send(301, target=full_url)
}

In English, if the client was referred by an external link, immediately issue a 301 redirect to the full URL of the current page. This should inform the robot that it is now looking at a new site and should start indexing content under a URL that I'm positive that I control.

This solution will collapse if the robot refuses to follow the 301 or does something that I don't expect.

I'd be interested to see a response from someone who understands spyders (since I don't).

Re:HTTP 301 filter by accidentalGeek · 2005-03-23 08:29 · Score: 2, Informative

Ach! this leads to an endless loop. Please note my revised (and more complicated)version

Google Patent on Duplicates & Hijacking Soluti by Anonymous Coward · 2005-03-23 07:39 · Score: 0

But it gets worse, because a final part of google's indexing process is to compare pages for identical text, and throw out all but one of the URLs. Apparently this stage has nothing to go on other than the text and the recorded URLs, and so your URL stands a fifty-fifty chance of being thrown out.

Not exactly. If you want to learn how to get rid of a 302 hijack, read Google's patents on removing duplicate content:

In a nutshell, Google says that when duplicate content is detected, the site with the highest PageRank wins. Also note that Google now implements LocalRank in addition to their PageRank equation:

The reason a spammer can overtake a site with a 302 is because the spammer is better at building PageRank, LocalRank and keyword relevancy via links to that virtual URL than the legitimate site owner (even if the site has been around for years). A search engine bot cannot determine the age of a link (yet) neither can it determine the legitimacy of a page beyond link popularity and the "revelancy" and "authority" of those links.

The main problem here is that many sites use 302 redirects as a tracking mechanism (which Google tries to parse and follow as links). This makes it undesireable to "penalize" all 302 redirects. These are often good links. Even Froogle uses 302 redirects with its review content, which allowed it to hijack competitor content in MSN Search until a spam report was filed. (Thanks MSN for fixing it so quickly!)

Some solutions needed here:

A tracking parameter that the search engines agree upon to eliminate the need to track with a 302. (So I can link to http://www.url.com/??sitename and do my tracking while the search engine can parse away the ??sitename to index the legitmate URL). This needs to be an agreed upon parameter much link rel="nofollow". Properly advertised and integrated into shopping carts and affiliate programs, this should help reduce the number of 302s netwide m

Won't work by ratboy666 · 2005-03-23 07:50 · Score: 1

"Redirects to a page should be treated as having far less PageRank value than the page itself. That will fix the problem."

Won't work. In a nutshell (expounded upon below) -- Google doesn't know what the "real page" is.

Specifically, I generate "301" on my own sites:

http://blurt.org/page_of_files

will generate a 301 redirect to:

http://blurt.org/page_of_files/

which then gets an index of files. Ok?

Now, sometimes I relocate data (popular big files):

http://blurt.org/p_o_f/bigfile.html

would get a 302 redirect to

http://big_ass_isp.com/myplace/bigfile.html

I don't want to loose my "pagerank" over the content location! And I am doing it to myself here...

Google will look at my site, and see the redirect. It spiders the page, and it is under MY url (where it should be).

But, if Google spiders the OTHER site (say, though an automatically generated links page), it may eliminate the proper URL.

But, REDUCING "pagerank" because I am providing for more bandwidth? (which is what you are suggesting). Why on earth would anyone want that?

Now, what SHOULD happen is that the page should have a tag on it that says "Please index me if you accessed me directly" or "Please index me if you accessed me via REDIRECT from there".

That would do it (I think -- not much of a web master, I'm afraid).

As it is, since the big_ass_isp site is under my control, I place a robots.txt file there to prevent Google from spidering big_ass_isp view of my pages.

But, I can't control other people from generating a REDIRECT to me (but I *can* additionally tag the page).

I don't think that there is much that Google can do about this. (Well, they could honour a custom tag - and propose of the W3 folk, which is what I think will happen).

Ratboy

--
Just another "Cubible(sic) Joe" 2 17 3061

Re:Won't work by Animats · 2005-03-23 11:33 · Score: 1

It's been suggested by someone else that redirects within the same domain should be treated more favorably than those outside it.

Spot on by clsc · 2005-03-23 07:51 · Score: 1

The real problem here is not the 302, its a bug in the googlebot. (...) When googlebot sees a 302 redirect to a page it treats the actual page and the redirect to the page as if they are one and the same.

Being the author of TFA that appeared a few days ago, i'll apologize for any confusion - yet, i'd say that you nailed it. Google has one page (as in "a set of indexed content") and a minimum of two URLs associated with it - at least one of these return something else than "OK", or "Not Modified", or whatever. Still, Google manages to pick one of the URLs that doesn't return one of these codes as being the appropriate URI for the set of content.

The interesting thing is that once the average searcher sees a result for, say "Site A" and clicks on it in good faith, he is not taken to "Site A" but directly to a script that is already in place on "Site B" and 100% controlled by "Site B".

Last time Googlebot saw this script, it redirected instantly to "Site A" ("302 Found"), but, you know... Scripts are scripts - they do one thing until you make them do another thing. And if you're a bit smart you can even make them conditional, showing Googlebot one thing and everybody else another. This is not even rocket science, it's really trivial programming at best. All you need is an "appropriate" site to forward "Site A" users to - preferably one that makes you instant money.

I'm not so sure about this though (that's why i snipped it from your post):

fortunately a realtively easy one to fix.

Before i wrote TFA that appeared a couple of days ago, i had been writing about this problem on search engine related fora for a very long time - literally more than a year, perhaps even two. These fora are frequented by verified search engine representatives, and the problem has also been solved ...By Yahoo! Not by Google.

How serious ... by Heisenbug · 2005-03-23 07:54 · Score: 1

It's more serious than simply copying your content, because the new site *replaces* yours in the ranking rather than competing with it. When I set up http://bad.site/1 through http://bad.site/100, all claiming to own the content at http://good.site/, Google displays only one of the 101 options in the listing -- and yours isn't too likely to get picked.

Actually, maybe that would happen anyway if I simply copied your site content perfectly 100 times. Not sure about that. Still, that aspect makes it much more of a concern.

Or to put it another way... by Anonymous Coward · 2005-03-23 08:25 · Score: 1, Funny

Let's see if I'm understanding this right. Correct me if I'm wrong...

I set up a goat.cx mirror. The goat.cx mirror contains a 302 redirect to slashdot.org, making Google think that my content has been temporarily moved to slashdot.org. Therefore Google thinks that what's at slashdot.org is just a temporary version of what's normally at my website. Therefore people who would have been sent to Slashdot get sent to my site instead, i.e. people trying to find Slashdot via Google get goatse'd.

Correct?

Possible defense: HTTP 301 filter by accidentalGeek · 2005-03-23 08:27 · Score: 2, Interesting

I haven't tried this. It's just an idea knocking around in my head.

What would happen if I set up a stateful filter on my web server that did the following?

1. If the http client provided a referrer header and that header contains my own domain name, exit (and let the request be processed normally)

3. Record the user agent header, client IP address, and current timestamp in some sort of temporary lookup table

4. Issue a http 301 with an absolute URL that points to the current page but with some technically insignificant rewrite from the way that the client requested it. For example, if the request is a simple GET, append a "?" or "&"

If the client was not referred by an internal link, this filter would instruct the client to reload the page in a way that insures that it knows the correct, full URL.

By itself, this would simply cause an infinite loop which a robot would probably detect. That's where the temporary lookup table and slightly modified URL come in. I left step two out of the list above because it does not apply until the second time the agent hits our page:

2. Consult the lookup table. If this agent already hit this page within the last n seconds, exit and allow the request to be processed normally.

I don't know much about how robots such as googlebot behave. I'd love to see a reply from someone who knows more than I do.

Possible Solution? by iolaus · 2005-03-23 08:43 · Score: 1

Would it be possible for Google to simply disregard all 302 redirects that refer to a domain different than the document being crawled? In this way, all sites using 302 redirects legitimately (referring to their own content in another location on their domain) would be unaffected while the site hijacking scum would be eliminated.

--
I find laziness to be an excellent motivator.

Re:Possible Solution? by fbartho · 2005-03-23 13:17 · Score: 1

What happens when a webserver crashed though... it would be trivial to put up a small machine with 302 redirects to a different domain where the content had been mirrored, while the original site is slowly rebuilt...

In fact assuming a site has an off location mirror or two, this would be an ideal solution, if the original site has multiple pieces of hardware that got taken offline catastrophically.

Your suggestion would fry these sites.

--
Gravity Sucks

How and when Yahoo fixed it by clsc · 2005-03-23 09:14 · Score: 2, Informative

Sorry for not writing this in the article - it's pretty long already and you just have to cut somewhere, but here goes:

Yahoo was exactly as vulnerable as the rest of the search engines. In fact this problem was pretty bad with Yahoo at one point. What Yahoo did was simply to fix it by implementing some internal rules about how to interpret redirects.

I believe it was fixed around June 2004 - at that time the problem had already been known (and aboused) for a long time, but use was not widespread yet. The details of the fix can be seen on this one-page PDF

It's simple (and identical to the solution i suggest in my article): When "Yahoobot" (actually it's called "Slurp") sees a 302 redirect, it checks if the domains of the redirect and the target are the same. If the redirect is from one domain to another, Yahoo keeps the URI from the target domain. If the redirect is from one page to another on the same domain, Yahoo keeps the "source" (ie. the redirect script URI).

THANK YOU! by luap2000 · 2005-03-23 09:23 · Score: 1

of course, some just have to take your word for it, but i've heard it from other sources too.

in fact, i think i've learned *a lot* today. ;)

re: the k5 article - cool. thanks. i'll try to see if an editor will let me change it later tonight. (and i admit that piece was a bit 'out there' too, although i tried my hardest to be objective for the most part...)

when you go from 7k to 700 visits a day, you start to lash out at things. i think i may have found the real cause of the problem, though. someone was DOSing my site and taking it down. i think Googlebot was trying to hit me before the server could come back up and maybe i got put on an 'unreliable server' list?

in any case, i'm looking into PPC and other options to spread things out some.

thanks again. please don't hold my foolishness against me. ;)

-kpaul (the real one)

--
J-Log: Journalism News, Media Views

Re:THANK YOU! by GoogleGuy · 2005-03-23 09:56 · Score: 2, Informative

You bet. If you want to make sure that we have the info to check it out, you can go to google.com/support and when you get to a form where you can enter info, just use canonicalpage as the subject line. We are collecting data to user support to build up a testset for checking any changes we want to try.

Definition: "Fiction" is a positive by YouMakeMeSoANGRY · 2005-03-23 09:52 · Score: 1

Whilst my original comment was supposed to be slightly tounge in cheek, I shall neverless play you pedantic definition game.

Prior to the get out clause afforded to them by the use of the word "almost" they explicitly state that it is fiction to say a competitor can have another site removed from Google's index.

Either it is fiction, or it can be done. If it is meerly hard to do it comes under the heading of fact.

Absolute hilarity by brian_turner · 2005-03-23 09:53 · Score: 4, Informative

Absolutely Roflmao!!

I guess some people have never heard of the term "sole trader". :)

My internet business is barely a year old - almost everything is communicated with other webmasters via e-mail - phone support is provided as a last option, but it means that if anyone really needs to use it, then they can have my immediate attention wherever I am, to have their concerns addressed immediately. :)

As for spamming - well, this is one of those "anonymous cowards" some of us are familiar with, who believes that if you purchase a link from another site, or become involved in a link exchange, or register your site in a directory - then you're a spammer. :)

Thanks for the heads up on the Platinax registration details, though - hadn't realised they'd been left out. I had a run in with some Belgian Nazis last year, after I booted them from a forum I admin, when they tried to use it for promoting Neo-nazi propaganda. They've tried a few times to get back at me since, so I've been trying to reclaim some privacy online. Platinax reg details should be public, though - I'll put something online, then try and fine a PO Box for the hate crap.

Re:Absolute hilarity by Anonymous Coward · 2005-03-23 17:48 · Score: 0

from the britecorp site

At the heart of this process is the building of an Independent Back-Linking Network (IBLN), an artificially created network of sites that imitates the wider linkage patterns of the wider internet. Dozens of websites and tens of thousands of pages can be involved in an IBLN, whose sole aim is to promote a particular client's website(s).

you are a spammer, your type are what makes google et al suck, filling it up with bogus sites and irellavent pages that are nothing more than cheap methods of trying to improve your pagerank

if you see a spade call it a spade, you are trying to fool the search engines into boosting your "link network" instead of being honest

so yeah you are a spammer,not an email spammer but a search engine spammer

What if we.. by firew0lfz · 2005-03-23 09:56 · Score: 1

all emailed google about this problem, like, right now? You think possibly that the wrath of a million slashdotters would make them listen?

Already sent them an emaily.

--
Try not to let life get in the way of living.

If you can beat the index... by clsc · 2005-03-23 09:57 · Score: 1

....why would you want to be an investment manager for others in the first place?

clsc.net down? no way jose *lol* by clsc · 2005-03-23 10:30 · Score: 1

it takes more than a little slashdotting... try again: Pagejack article

Re:clsc.net down? no way jose *lol* by luap2000 · 2005-03-23 10:55 · Score: 1

it was earlier - for about 15 minutes maybe.

i'd say it was my end, but other sites were coming up fine.

it kept timing out on me.

-kpaul

--
J-Log: Journalism News, Media Views

I reported one and it's gone by Anonymous Coward · 2005-03-23 10:32 · Score: 0

Report link spam here. I can confirm that I reported a domain full of link spam as GoogleGuy suggested in the March 18th story and the whole mess was quickly removed from their index.

What's a 302 exploit? by Xenophon+Fenderson, · 2005-03-23 10:40 · Score: 1

I admit haven't been paying attention to this. What exactly is this 302 exploit? Is it just a matter of attackers spoofing referrer entries in their GET requests so the attacker's web site gets listed on the target's blogroll? Or is there more to it than that?

--
I'm proud of my Northern Tibetian Heritage

Re:What's a 302 exploit? by mabu · 2005-03-23 10:59 · Score: 1

1. Google has your site indexed.

2. Someone uses a cgi script on their site that links to yours, but they use a special "302 Found " HTTP code instead of the standard "Location:" or HREF.

3. As per RFC 2616, a 302 redirect instructs the client to ignore the host and assume the subsequent request originates under the current host reference.

4. Normally this isn't much different from a regular hyperlink but Google has their index rules set up to remove duplicates from the catalog, and by treating a 302 reference the way they are, this could result in Google deleting the original copy of the site from its index and replacing it with the URL from the site that deployed the exploit.

5. End result. Someone hijacks your content in the Google database and associates it with a different URL/host.

Here's the relevant info from RFC 2616:

10.3.3 302 Found

The requested resource resides temporarily under a different URI.
Since the redirection might be altered on occasion, the client SHOULD
continue to use the Request-URI for future requests. This response
is only cacheable if indicated by a Cache-Control or Expires header
field.

The temporary URI SHOULD be given by the Location field in the
response. Unless the request method was HEAD, the entity of the
response SHOULD contain a short hypertext note with a hyperlink to
the new URI(s).

If the 302 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.

Note: RFC 1945 and RFC 2068 specify that the client is not allowed
to change the method on the redirected request. However, most
existing user agent implementations treat 302 as if it were a 303
response, performing a GET on the Location field-value regardless
of the original request method. The status codes 303 and 307 have
been added for servers that wish to make unambiguously clear which
kind of reaction is expected of the client.

Actually an example has been posted by clsc · 2005-03-23 10:45 · Score: 1

An example was posted in the beginning of the thread: site:drudgereport.com

A quick count showed 12% of the top 100 not being the real domain (i may have missed one or two). Actually this is quite common for the major news sites (please disregard opinion on drudgereport, this is about his URLs not his journalism)

And, clsc.net is still not down :p

Re:Actually an example has been posted by GoogleGuy · 2005-03-23 13:12 · Score: 2, Insightful

claus, I'm glad that you mentioned this search. I looked through those 100 results. Every example that I saw in those results was from a while ago--they were all listed with the Supplemental Result tag. So this is already handled correctly in our main index, and as urls are updated in the supplemental index, those examples should be handled correctly as well.

Thanks for mentioning this search; it's a good point. We've already made some changes to improve our heuristics, and you can see that improvement in the fact that current urls look better than the supplemental urls.

Won't work: Robots don't send the referrer by clsc · 2005-03-23 10:58 · Score: 1

As also written in TFA: The search engine spiders don't send a referrer, so your method won't work. No, they can't "just send a referrer", because they could have found a link to your site on a lot of pages, so which one should they choose? Also, some popular firewalls don't send the referrer either.

Re:Won't work: Robots don't send the referrer by accidentalGeek · 2005-03-23 13:08 · Score: 2, Insightful

More precisely, googlebot always sends the same referrer. Here's a snippet from an apache access log.

----------------- 64.68.80.4 - - [01/Mar/2005:16:19:24 -0500] "GET /robots.txt HTTP/1.0" 200 770 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html) ------------ -----

In practice, a static referrer and no referrer amount to the same thing so you're right from a practical standpoint. The referrer is not useful.

But that's OK because the system I described does not depend on the referrer header. If a referrer header is available, it will use it as a shortcut to determine that if client was referred by an internal link and potentially bypass the whole redirect process. This saves system and and network use for the majority of cases when the client is an ordinary web browser, but it's not essential and clearly won't be useful when the client is googlebot (or some other robot that does not provide a referrer).

If the client is a googlebot, the filter will see that there's no referrer. It will then check its stateful cache to determine if it has seen this robot recently. If so, it will let the robot right through and the request will be procesed normally. If not, it will issue the slightly obfuscated 301 redirect. When the robot follows this redirect, the filter will be invoked again. This time, it will recognize the robot from its previous visit and will let it through.

search engine bots don't send a referrer by clsc · 2005-03-23 11:02 · Score: 1

(see title of post)

Re:search engine bots don't send a referrer by grungefade · 2005-03-23 12:21 · Score: 0

maybe you didnt understand my post. Let me explain to someone that obviously speaks before reading my first post. The code could be utilized in 2 oposite ways!

Lets say i made a page about Martha stewart and googlebot came and indexed it. well when someone comes from google they have a google referrer right? so then i display different content of some porno, and all the poor people expecting to get martha get porn.

And i also said it could be used to have a martha stewart page, and then make the code recognize the googlebot and then show it porn. so it indexes other porn this time that isnt there. so people looking for porn get martha stewart, which might be a good thing to some. but god, how plainly do i have to explain it? either way, people coming from google get content that wasnt there when googlebot came.
Re:search engine bots don't send a referrer by YetAnotherLogin · 2005-03-23 12:39 · Score: 1

Google referrer? Are you talking about the Googlebot user-agent string?
Re:search engine bots don't send a referrer by grungefade · 2005-03-23 16:23 · Score: 0

google referrer means that they just came from google. So if they search for something in google and click on a link, you can look up where they came from when they come to your site. so therefore in my previos code

$referrer = $_SERVER['HTTP_REFERER'];

$referrer would = "http://www.google.com', or "http://www.google.uk", etc, etc. You get the point.

but lets say that maybe google's bot fakes the referrer when it is crawling your site, to make sure that your not showing different content based on when someone comes to your site from thier search engine. then that code would not work at all because then googlebot would index the porn and not the martha. well a way around that is to find out the ip of googlesbot and write code that would show different content only to good ol' googlebot. and everyone else would see all your ads that googlebot didnt.

if ($_ENV['HTTP_REMOTE_ADDR'] == "googlebots_ip"){
//Martha bakes wonderfull cookies

}

else{
//Martha you never knew

}

so i dont see how google will ever solve the problem of people faking and spamming the seach engine. because google will never be able to detect what happens before the content is delivered to the browser. they can only analyze the code in the page, then see if you are getting forwarded on mouseover or other tricks that are currently being used. Google got a great ride while it lasted, but eventually major changes will need to happen in the way browsers work for google to keep on top in the future. because soon searches will result in nothing relavent.

But i guess there is one way they could fix searches. Once spamming for a keyword fills up the first 10 results, have it throw them out and not display them. Google would have to keep an eye on lots of keywords daily and keep throwing out the first 15, 5, 34 results, depending on how much spam for that word. but then again, is that fair to discriminate the people that just do to their webpages the things that googlebot likes most?

Already, so much fraud is happening with adwords and the like. I really dont see how their business model has survived this long. Millions of dollars that google gets monthly, if not daily are from companies clicking their competitors links. I think so much of it goes on that they dont want people to know about it. They revolve around adwords. Seems to me they should just build in a feature for companies to enter their competitors IP's(or range). Simple enough. Then they wont be charged for those clicks. So many commen sense things that google hasnt done when they have so many PHDs working for them, just dosent make sense. Or maybe that feature would mean a lot less money.
Re:search engine bots don't send a referrer by Anonymous Coward · 2005-03-24 10:17 · Score: 0

>> [i]but lets say that maybe google's bot fakes the referrer when it is crawling your site[/i]

Googlebot sends a UserAgent string but does not include a referrer. The bot is not a browser. The bot does not follow links from page to page. The bot adds URLs that it finds to a "crawl this later list", and when it has finished spidering a page, asks the database for the next URL to crawl. Multiple bots are adding to the list. Multiple bots are assigned work from the list. There is NO referrer when Googlebot visits your site.

English pansies by Anonymous Coward · 2005-03-23 11:11 · Score: 0

Muppet.

Pretty stupid by alienz · 2005-03-23 11:28 · Score: 1

LOL All these posts to explain the same thing over and over. The sad part is..there are others that still don't get it.

Re:Pretty stupid by Anonymous Coward · 2005-03-24 05:53 · Score: 0

The really sad thing is that Google does not get it either.

A past example by Anonymous Coward · 2005-03-23 11:40 · Score: 0

Freenetworks.org used to do 302 redirects to "affiliate" sites.

ie. http://amsterdam.nl.freenetworks.org goes to http://www.losnet.nl

Freenetworks.org has a pretty high pagerank in the wireless community, so they ended up hijacking sites where the url of the affiliate site would be replaced by xxx.xx.freenetworks.org instead of the usual url. This was fixed when freenetworks were notified and they changed the 302 redirects to 301's.

Re:thanks by luap2000 · 2005-03-23 11:45 · Score: 1

i didn't see it in the first few hundred of a site:domain.com

what's the other explanation for another URL having my title, description and cache?

when i run a header check on it, it shows a 302 redirect then 200.

in this case, i think the other person doesn't know using 302 isn't correct (ie. it's a link collection script...)

is there another explanation?

-kpaul

--
J-Log: Journalism News, Media Views

21st century grandmas should be hip by cheekyboy · 2005-03-23 11:50 · Score: 1

A woman of age 50 can be a grandma easily, (ie child at 23, gives birth at 20). That grandma grew up in the 60s/70s, and most likely went to clubs and hanged out with the hippies etc... so 21st grandmas are all hip and cool not like the yesteryear pre 40s teeners.

--
Liberty freedom are no1, not dicks in suits.

Not Entirlely Legal... by Kaenneth · 2005-03-23 13:20 · Score: 1

"is supposed to interpret whatever content it finds at the 302 target (your site) as really belonging to the URL of the source (my site)."

Claiming ownership of someone elses copyrighted works, I would think is actionable.

good luck in seeking a legal remedy though.

Slight Correction by ShaunC · 2005-03-23 20:16 · Score: 1

1. search Google for 'allinurl:', e.g. 'allinurl:slashdot.org'.

Actually, as I understand it, you should search Google for 'site:' e.g. 'site:slashdot.org' and not 'allinurl:'.

'allinurl:' shows URLs that contain a specific keyword, which can lead to false positives. 'site:' is supposed to show only the pages that Google knows about within a certain domain. If you search for 'site:yoursite.com' and get results from sites other than yoursite.com, then you know you've been affected. Especially if those other domains have taken the #1 result.

Here is one example.

Here is another example.

--
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!

other methods than "site:example.com" by clsc · 2005-03-23 23:11 · Score: 1

kpaul i'll locate you elsewhere.

- i'm not sure if you re-read these threads (it's a godd thing to do as /. threads aren't linear - new messages pop up everywhere, including the parts you have already read once).

If you do, and for others:

The "site:example.com" search is a good tool. However, it's not always practical if you have a lot of pages, as it's not always that you will be able to spot hijacks among the first 1000 pages.

So, try searching for specific document titles in stead, putting the document title in quotes. This way you will easily see if there's a result that has your headline, your snippet, your cache, and a URL that is not from your domain.

>> what's the other explanation

In all this talk about 302's we sometimes forget that a META REFRESH with a timeout of zero can do the exact same thing as a 302 redirect.

another "other" explanation by clsc · 2005-03-23 23:28 · Score: 1

also, it coud simply be one of these:

- a copy of your page(s) on another domain
- a mirror of your page(s) on another domain
- another domain proxying your domain

Regarding these three cases, the "wrong URLs" should not be seen as an error, imho.

When i use these words, "mirror" usually refers to "close to verbatim copy" (more than 95% verbatim copy) -- ie. almost no difference in the content from your page -- while "copy" could easily just be fragments of your page, perhaps even mixed with fragments of other pages. A "proxy" will be 100% verbatim copies; in fact it will be your exact site, only shown on another URL.

For those that follow these things, 1bu.com is not a proxy, it's a mirror (as it strips out flash and stuff).

Re:another "other" explanation by luap2000 · 2005-03-24 02:08 · Score: 1

thanks. it's not any of those. it's a link directory script pointing to my site.

i'll try the above title trick to see if it shows in site:domain.com as well...

--
J-Log: Journalism News, Media Views

a solution, add extra robot rules, check referer by Anonymous Coward · 2005-03-24 00:33 · Score: 0

The webmaster of doi.contentdirections.com
and doi.org can agree to add some extra
robot rules that tell googlebot
that it's ok to follow 302 redirects
to/from doi.contentdirections.com and doi.org

another solution is for webmasters
of doi.contentdirections.com to only allow
redirects from doi.org, so the 302 trick don't work

get it?

Ugh. This is so not true. Definitely by boredguru · 2005-03-24 03:39 · Score: 1

I am seeing all over the net a discussion on 302 Hijackings and that Google is evil. But the thing is no one is discussing the actual cause of it. The actual cause is the HTTP Protocol that says EXPLICITLY

"10.3.3 302 Found

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field." - Emphasis not mine.
You can read it for yourself at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10. html

Now we all know the importance of protocol. Its a communicating language. In this case the protocol was basically developed when the WEB was pure and unadulterated. Where people expected others to follow and not misuse the protocol.

But with money always comes greed and dishonesty. WEB originally was not built with Business in mind. It was for free Information Interchange. But it has just evolved to a state where Commercially the WEB can be harnessed (exploited whatever) for its potential.

So now any search engine that follows the protocol to the letter is in effect aiding the Hijacking, but is it the mistake of the search engine or the protocol? Unlike Human languages, protocols dont evolve uninhibited. If it did then very soon no browser can understand all the servers and vice versa. i.e you might need 10 kinds of browsers to access 10 different website, beacuse those 10 websites talk a different language.
(Now come to think of it, is this not what is happening in the DRM world. You download music from one site and you can't play it on another without a hack). That is the reason there is a standard and it gets revised every so often so that it can also keep up with the times.

So some of the suggestions like throw the redirecting page into the bin and keep the target page will really have web wide repurcussions for people who use it with the standard in mind and with a legitimate purpose. So you ask who uses it and for what purpose?
Let me give an example.
Ever tried buying from Amazon.com?
Okay how do you reach the homepage?
Well i type in amazon.com into my browser and i get the page. BUT the url at which i get the page is exactly now http://www.amazon.com/exec/obidos/subst/home/home. html/103-7996157-2162261

Use this server header tool for understanding what happens http://www.webrankinfo.com/english/tools/server-he ader.php

1) Enter www.amazon.com
It says
HTTP/1.1 301 Moved Permanently
Date: Thu, 24 Mar 2005 14:38:22 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: skin=; domain=.amazon.com; path=/; exp ires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com:80/exec/obidos/sub st/home/home.html
Connection: close
Content-Type: text/plain
So amazon.com doesnt exist (dont mistake me, the page amazon.com) what exists is http://www.amazon.com:80/exec/obidos/subst/home/ho me.html. 2) Now enter http://www.amazon.com:80/exec/obidos/subst/home/ho me.html in the box.
It says
HTTP/1.1 302
Date: Thu, 24 Mar 2005 14:40:48 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1112256000; path=/; do main=.amazon.com; expires=Thursday, 31-Mar-2005 08 :00:00 GMT
Set-Cookie: session-id=002-8272699-5270422; path=/ ; domain=.amazon.com; expires=Thursday, 31-Mar-200 5 08:00:00 GMT
Location: http://www.amazon.com/exec/obidos/subst/ home/home.html/002-8272699-5270422
Connection: close
Content-Type: text/html

So now the home page is temporarily at http://www.amazon.com/exec/obidos/subst/home/home. html/002-8272699-5270422

If Google were

uhh, for the money maybe? by 2short · 2005-03-28 05:29 · Score: 1

Lets say you could beat the index pretty soundly, acheiving a reliable 15% annual return. Let's further say you have considerably more available capital than most, say 1 million dollars. So you can manage your money and make 150,000 in a year, and you probably should.

Why would you want to be an investment manager for others? Because if you can reliable acheive a 15% return, "others" will pay you several million a year, at least.

Slashdot Mirror

Millions of Pages Google Hijacked using ODP Feed

427 comments