It seems to me that -- if these rumours have any truth to them -- MS might actually be trying to move the data processing away from the specific document (format), and indeed away from the application that handles that specific document format.
In stead the language used for processing of data will be an "umbrella language" that is tied closer to the OS than to any specfic application.
This could be a very good move.
But it still sucks that -- if these rumours have any truth to them -- I'll have to rewrite lots and lots of very neat and unbroken VBA code.
- a copy of your page(s) on another domain - a mirror of your page(s) on another domain - another domain proxying your domain
Regarding these three cases, the "wrong URLs" should not be seen as an error, imho.
When i use these words, "mirror" usually refers to "close to verbatim copy" (more than 95% verbatim copy) -- ie. almost no difference in the content from your page -- while "copy" could easily just be fragments of your page, perhaps even mixed with fragments of other pages. A "proxy" will be 100% verbatim copies; in fact it will be your exact site, only shown on another URL.
For those that follow these things, 1bu.com is not a proxy, it's a mirror (as it strips out flash and stuff).
- i'm not sure if you re-read these threads (it's a godd thing to do as/. threads aren't linear - new messages pop up everywhere, including the parts you have already read once).
If you do, and for others:
The "site:example.com" search is a good tool. However, it's not always practical if you have a lot of pages, as it's not always that you will be able to spot hijacks among the first 1000 pages.
So, try searching for specific document titles in stead, putting the document title in quotes. This way you will easily see if there's a result that has your headline, your snippet, your cache, and a URL that is not from your domain.
>> what's the other explanation
In all this talk about 302's we sometimes forget that a META REFRESH with a timeout of zero can do the exact same thing as a 302 redirect.
>> what explains another URL having my title and description and cache?
It could be what you think it is, and it could be something else. Try the "site:example.com" search in stead of "in url" - and if you find the other domain again, run that URL through a server header checker. If it returns a 200, disable javascript and see if it has a meta redirect on the page.
As also written in TFA: The search engine spiders don't send a referrer, so your method won't work. No, they can't "just send a referrer", because they could have found a link to your site on a lot of pages, so which one should they choose? Also, some popular firewalls don't send the referrer either.
A quick count showed 12% of the top 100 not being the real domain (i may have missed one or two). Actually this is quite common for the major news sites (please disregard opinion on drudgereport, this is about his URLs not his journalism)
Sorry for not writing this in the article - it's pretty long already and you just have to cut somewhere, but here goes:
Yahoo was exactly as vulnerable as the rest of the search engines. In fact this problem was pretty bad with Yahoo at one point. What Yahoo did was simply to fix it by implementing some internal rules about how to interpret redirects.
I believe it was fixed around June 2004 - at that time the problem had already been known (and aboused) for a long time, but use was not widespread yet. The details of the fix can be seen on this one-page PDF
It's simple (and identical to the solution i suggest in my article): When "Yahoobot" (actually it's called "Slurp") sees a 302 redirect, it checks if the domains of the redirect and the target are the same. If the redirect is from one domain to another, Yahoo keeps the URI from the target domain. If the redirect is from one page to another on the same domain, Yahoo keeps the "source" (ie. the redirect script URI).
The real problem here is not the 302, its a bug in the googlebot. (...) When googlebot sees a 302 redirect to a page it treats the actual page and the redirect to the page as if they are one and the same.
Being the author of TFA that appeared a few days ago, i'll apologize for any confusion - yet, i'd say that you nailed it. Google has one page (as in "a set of indexed content") and a minimum of two URLs associated with it - at least one of these return something else than "OK", or "Not Modified", or whatever. Still, Google manages to pick one of the URLs that doesn't return one of these codes as being the appropriate URI for the set of content.
The interesting thing is that once the average searcher sees a result for, say "Site A" and clicks on it in good faith, he is not taken to "Site A" but directly to a script that is already in place on "Site B" and 100% controlled by "Site B".
Last time Googlebot saw this script, it redirected instantly to "Site A" ("302 Found"), but, you know... Scripts are scripts - they do one thing until you make them do another thing. And if you're a bit smart you can even make them conditional, showing Googlebot one thing and everybody else another. This is not even rocket science, it's really trivial programming at best. All you need is an "appropriate" site to forward "Site A" users to - preferably one that makes you instant money.
I'm not so sure about this though (that's why i snipped it from your post):
fortunately a realtively easy one to fix.
Before i wrote TFA that appeared a couple of days ago, i had been writing about this problem on search engine related fora for a very long time - literally more than a year, perhaps even two. These fora are frequented by verified search engine representatives, and the problem has also been solved...By Yahoo! Not by Google.
...is that the search engine result page point directly to a script that is under the control of the hijacker. No middleman, and the searcher is in good faith. Think about it.
I've been writing on this topic on search engine related forums frequented by search engine representatives for more than a year. The problem has not been fixed, lately it's been getting a whole lot worse.
a late reply, don't know if it will ever be seen but you deserve it as your assumptions (about anything else than my personal skills and level of knowledge) are right. Still, your conclusion is not:
The semantics behind a 301 and 302 are VERY different and unless you want people to replace the original URI with the target in your 301s, forever, you might be entering a world of hurt.
Okay, let's try that:
Replace the original URI: www.hijacker.com/script.cgi?id=12345
With the target: www.right-url.com/right-page.html
See?See? While your assumptions are right you did overlook the essential part. Also, as others have noted, this is not the recommended fix, just a precaution that some webmasters might wish to take, entirely on their own behalf.
--------
This "exploit" isn't very interesting
I take it that you are not a webmaster out to make easy bucks, implement spyware, virii, hoaxes, or generally cause harm to others -- otherwise your view would probably be the opposite. May you live long and prosper and may your camels always find water.
- that's in fact the point of TFA. You OWN the search results of the target page and hence the searcher.
Google redirects the listing for the target page to a script of yours, not even to a page.
Think about it: From search results straight to script, and the user thinks he's going to visit some page that will be relevant - how much easier could it be?
>> "the sometimes rigorous entry process for foreigners, which they see as a deterrent to tourism"
Rigorous? Is that the word you choose?.. wtf's up with you? It's bloody harassment and outright disrespect for human rights, that 's what it is!
A whole lot of hearsay, rumours and fearmongering... But honestly, do you really *know* anything? And can you *prove* it?
This goes for the original poster, too.
It seems to me that -- if these rumours have any truth to them -- MS might actually be trying to move the data processing away from the specific document (format), and indeed away from the application that handles that specific document format.
In stead the language used for processing of data will be an "umbrella language" that is tied closer to the OS than to any specfic application.
This could be a very good move.
But it still sucks that -- if these rumours have any truth to them -- I'll have to rewrite lots and lots of very neat and unbroken VBA code.
..and they bought http://alltheweb.com/ too. IMHO, this one was consistently better than Google for a long time.
>> "X10 ad museum"
Funny how I read that as "X10 ad nauseam". While we are there, anyone remember "punch the monkey"?
I'm so glad my browser let's me block image animations, and that it does not have Flash.
Am I the only one that get reminded of the Sausage Machine in the The Wall movie when looking at picture 2?
- posted in hardware?
Four out of top four contains the answer as far as i can tell. Six out of top six, even.
s rc=0&o=0
http://web.ask.com/web?q=why+are+flamingos+pink&q
also, it coud simply be one of these:
- a copy of your page(s) on another domain
- a mirror of your page(s) on another domain
- another domain proxying your domain
Regarding these three cases, the "wrong URLs" should not be seen as an error, imho.
When i use these words, "mirror" usually refers to "close to verbatim copy" (more than 95% verbatim copy) -- ie. almost no difference in the content from your page -- while "copy" could easily just be fragments of your page, perhaps even mixed with fragments of other pages. A "proxy" will be 100% verbatim copies; in fact it will be your exact site, only shown on another URL.
For those that follow these things, 1bu.com is not a proxy, it's a mirror (as it strips out flash and stuff).
kpaul i'll locate you elsewhere.
/. threads aren't linear - new messages pop up everywhere, including the parts you have already read once).
- i'm not sure if you re-read these threads (it's a godd thing to do as
If you do, and for others:
The "site:example.com" search is a good tool. However, it's not always practical if you have a lot of pages, as it's not always that you will be able to spot hijacks among the first 1000 pages.
So, try searching for specific document titles in stead, putting the document title in quotes. This way you will easily see if there's a result that has your headline, your snippet, your cache, and a URL that is not from your domain.
>> what's the other explanation
In all this talk about 302's we sometimes forget that a META REFRESH with a timeout of zero can do the exact same thing as a 302 redirect.
>> what explains another URL having my title and description and cache?
It could be what you think it is, and it could be something else. Try the "site:example.com" search in stead of "in url" - and if you find the other domain again, run that URL through a server header checker. If it returns a 200, disable javascript and see if it has a meta redirect on the page.
(see title of post)
As also written in TFA: The search engine spiders don't send a referrer, so your method won't work. No, they can't "just send a referrer", because they could have found a link to your site on a lot of pages, so which one should they choose? Also, some popular firewalls don't send the referrer either.
An example was posted in the beginning of the thread: site:drudgereport.com
A quick count showed 12% of the top 100 not being the real domain (i may have missed one or two). Actually this is quite common for the major news sites (please disregard opinion on drudgereport, this is about his URLs not his journalism)
And, clsc.net is still not down :p
it takes more than a little slashdotting... try again: Pagejack article
....why would you want to be an investment manager for others in the first place?
Sorry for not writing this in the article - it's pretty long already and you just have to cut somewhere, but here goes:
Yahoo was exactly as vulnerable as the rest of the search engines. In fact this problem was pretty bad with Yahoo at one point. What Yahoo did was simply to fix it by implementing some internal rules about how to interpret redirects.
I believe it was fixed around June 2004 - at that time the problem had already been known (and aboused) for a long time, but use was not widespread yet. The details of the fix can be seen on this one-page PDF
It's simple (and identical to the solution i suggest in my article): When "Yahoobot" (actually it's called "Slurp") sees a 302 redirect, it checks if the domains of the redirect and the target are the same. If the redirect is from one domain to another, Yahoo keeps the URI from the target domain. If the redirect is from one page to another on the same domain, Yahoo keeps the "source" (ie. the redirect script URI).
Being the author of TFA that appeared a few days ago, i'll apologize for any confusion - yet, i'd say that you nailed it. Google has one page (as in "a set of indexed content") and a minimum of two URLs associated with it - at least one of these return something else than "OK", or "Not Modified", or whatever. Still, Google manages to pick one of the URLs that doesn't return one of these codes as being the appropriate URI for the set of content.
The interesting thing is that once the average searcher sees a result for, say "Site A" and clicks on it in good faith, he is not taken to "Site A" but directly to a script that is already in place on "Site B" and 100% controlled by "Site B".
Last time Googlebot saw this script, it redirected instantly to "Site A" ("302 Found"), but, you know... Scripts are scripts - they do one thing until you make them do another thing. And if you're a bit smart you can even make them conditional, showing Googlebot one thing and everybody else another. This is not even rocket science, it's really trivial programming at best. All you need is an "appropriate" site to forward "Site A" users to - preferably one that makes you instant money.
I'm not so sure about this though (that's why i snipped it from your post):
Before i wrote TFA that appeared a couple of days ago, i had been writing about this problem on search engine related fora for a very long time - literally more than a year, perhaps even two. These fora are frequented by verified search engine representatives, and the problem has also been solved ...By Yahoo! Not by Google.
lotsa peers and seeds - no problems downloading the torrent here
...is that the search engine result page point directly to a script that is under the control of the hijacker. No middleman, and the searcher is in good faith. Think about it.
I've been writing on this topic on search engine related forums frequented by search engine representatives for more than a year. The problem has not been fixed, lately it's been getting a whole lot worse.
Where can i collect the CD?
The semantics behind a 301 and 302 are VERY different and unless you want people to replace the original URI with the target in your 301s, forever, you might be entering a world of hurt.
Okay, let's try that:
Replace the original URI: www.hijacker.com/script.cgi?id=12345
With the target: www.right-url.com/right-page.html
See? See? While your assumptions are right you did overlook the essential part. Also, as others have noted, this is not the recommended fix, just a precaution that some webmasters might wish to take, entirely on their own behalf.
--------
This "exploit" isn't very interesting
I take it that you are not a webmaster out to make easy bucks, implement spyware, virii, hoaxes, or generally cause harm to others -- otherwise your view would probably be the opposite. May you live long and prosper and may your camels always find water.
Google redirects the listing for the target page to a script of yours, not even to a page.
Think about it: From search results straight to script, and the user thinks he's going to visit some page that will be relevant - how much easier could it be?