Is Microsoft Crawling Google?

Don't concern yourself with this crap... by garcia · 2004-11-11 07:37 · Score: 4, Insightful

Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.

Sure, I see crawlers on my site all the time sometimes hitting the same URL over and over again. Do I understand their repetitive behavior? No. Do I care what they are doing? No, as long as they are obeying my robots.txt.

I have complained before about MSNbot ignoring changes to robots.txt while Google happily changed its habbits (I can't find the link sorry). My recent fighting with Googlebot has come to a head when I had to disallow them access to my gallery completely because they refused to honor anything except Disallow: /. I had to go so far as to point Googlebot at my robots.txt and tell it to remove all the previous links. It was rather annoying dealing with support via email from Googlebot as they have apparently taken on the stance of "we don't care but you should put meta tags in all your files so that we don't index those pages." Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.

Do I care if MSNbot is crawling Google and then finding sites and links to search? No as it's none of OUR concern. What is OUR concern is our own robots.txt and how the spiders interact with our sites through that file. Let Google deal with Microsoft/MSNbot if that's what needs to be done but don't concern yourself with it otherwise.

Re:Don't concern yourself with this crap... by finkployd · 2004-11-11 07:45 · Score: 4, Insightful

Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.

No offense dude, but you are the one who put the site out their publically. Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion. I can write a search bot today that completely ignores it and there is nothing wrong with that (except perhaps ethically but even that is arguable) If you don't want people (or bots) viewing it then password protect it or take it off the public interweb.
Re:Don't concern yourself with this crap... by garcia · 2004-11-11 07:47 · Score: 2, Insightful

Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion.

Crawling a gallery of images (and all image property links as well) all day for several days might be considered "DoSing" I consider it being rude.

You're right, they don't have to obey the robots.txt but they should when they say they will.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 07:48 · Score: 1, Interesting

This is insightful? If your stuff is on the net, you should not expect it to remain private. So their bot is crawling your site. Get over it. If you don't want them crawling your stuff for profit, protect the directory or just ban them. Or just put meta tags in your pages like they said.

The bot should be treated as no different from another anonymous human. If not the Googlebot, one of the other search engines is bound to find it.
Re:Don't concern yourself with this crap... by mollymoo · 2004-11-11 07:59 · Score: 5, Interesting

No offense dude, but you are the one who put the site out their publically. Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion.

There's more to it than that. Google caches your pages and makes that cache of your copyright material available. Arguably if you have used your robots.txt file to tell it not to index (and therefore cache) your pages and it still does they are breaching copyright. OK, the Google cache is the world's largest breach of copyright anyway, but if you have told its spider not to index and it does regardless, that's a different ballgame.
Putting it out there on the web does not give anyone the right to do with it as they please.

--
Chernobyl 'not a wildlife haven' - BBC News
Re:Don't concern yourself with this crap... by Eric+Giguere · 2004-11-11 08:12 · Score: 3, Interesting

Sure, I see crawlers on my site all the time sometimes hitting the same URL over and over again. Do I understand their repetitive behavior? No.

Google gives a partial answer to this on their GoogleBot page:

In general, Googlebot should only download one copy of each file from your site during a given crawl. Occasionally the crawler is stopped and restarted, and it may recrawl pages that it has recently retrieved. These recrawls should happen infrequently.

If they're playing around with new indexing algorithms then I would expect to see more of these multiple hits.
Eric
How to (gently) detect Internet Explorer
Re:Don't concern yourself with this crap... by liquidsin · 2004-11-11 08:13 · Score: 4, Interesting

Hmmm...let's call "robots.txt" a "copyright control device" in that it states who may and may not have access to my copyrighted images directory. I'd bet a DMCA suit or two for circumventing your copyright control device would get them to pay attention...

--
do not read this line twice.
Re:Don't concern yourself with this crap... by gUmbi · 2004-11-11 08:15 · Score: 2, Funny

If you don't want people (or bots) viewing it then password protect it or take it off the public interweb.

Interweb? Is that the same as the 'Information superhighway'?
Re:Don't concern yourself with this crap... by Thumpnugget · 2004-11-11 08:21 · Score: 3, Funny

Interweb? Is that the same as the 'Information superhighway'?

They're very similar. One notable difference is that the Information Superhighway was invented by Al Gore.

--
Free yourself. Everything else will follow.
Re:Don't concern yourself with this crap... by neurojab · 2004-11-11 08:23 · Score: 1

Interweb? Is that the same as the 'Information superhighway'?

All the internets are affected.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 08:23 · Score: 0

The bot should be treated as no different from another anonymous human. If not the Googlebot, one of the other search engines is bound to find it.

Anonymous humans aren't spidering my site for hours on end daily for their own personal gain.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 08:24 · Score: 2, Funny

In the early days of MSN Bot, it ate up about 4 GB of my bandwidth on ONE html page, requesting it constantly, every few seconds for days! I emailed Microsoft and they replied with a 'oops, we found the problem'. That doesn't pay my bandwidth overage changes, does it?
Re:Don't concern yourself with this crap... by rahlquist · 2004-11-11 08:26 · Score: 1

In this article about archive.org they have an interesting counterpoint to your presumption.
"In the same vein, the robot exclusion file that some sites use to declare parts of the site out-of-bounds to normal spiders can be ignored by the Archive if they are doing a crawl on behalf of an authority such as the Library of Congress. In that case, the webmaster will receive a notification, suitable for framing, explaining that they should be suitably honored that the exclusion files are going to be ignored so that this site can be added to the Library of Congress Web Archive."
So if you exclude all crawlers and the robots file is copy protection under the DMCA does the Library of Congress have a right to circumvent your protection?
IANAL but that would be a cool fight to see.

--
Sick of stupidity? http://www.patentlystupid.com
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 08:32 · Score: 2, Insightful

Well anything on the internet that doesn't have normal web server access controls blocking access, is open slather IMO. That's what makes the internet so cool. Doesn't mean you can't still copyright your material so others can't use it, but I think for search engine purposes there is an implied agreement between YOU and THEM - and I think there should be.

In a sense it's like tourism. The world is full of stuff like Historical buildings and the owners of those places have legal rights against theft/damage etc. But the tour companies can still take people around the streets and show them the places without having to necessarily pay a fee.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 08:43 · Score: 2, Insightful

Since databases are currently copyrightable, I would argue that a website is a database. If Google insists, I would imagine that MSN, hitting Google's database...er, website, would amount to using a copyrighted database without its owner's permission, which in this case could amount to being a robots.txt file that punts known websites that link w/o attribution.

I would imagine a metacrawler, which attributes its links back to Google, is probably OK, because it keeps Google's adstream intact when the user clicks on the link to a Google search result.

But MSNSearch (or whatever it's called), taking Google search results as its own without attribution, well, that might be a copyright infringement...

If you were an on-line bookstore and deep-linked to Amazon's reviews while portraying them as your own, well, you're gonna get a C&D from Amazon's lawyers awfully fast.
Re:Don't concern yourself with this crap... by nofx_3 · 2004-11-11 08:55 · Score: 4, Funny

Yes, but I invented the "Information Historic Old Country Road" its not fast, and there ain't much information, but its so durn quaint you gotta love it.

-kaplanfx

--
Visualize Whirled Peas
Re:Don't concern yourself with this crap... by CowboyBob500 · 2004-11-11 09:17 · Score: 2, Interesting

As far as I see, MSNBot is behaving itself whilst Googlebot is hungriest - (much as I hate to stick up for Microsoft).

Googlebot (Google) 74 945.51 KB 11 Nov 2004 - 03:02
Netcraft Web Server Survey 13 0 10 Nov 2004 - 23:48
Mirago 6 76.44 KB 02 Nov 2004 - 04:13
MSNBot 6 76.44 KB 05 Nov 2004 - 05:58

It's interesting that Mirago and MSNBot have taken exactly the same bandwidth in the same amount of visits. Are MS innov^H^H^H^H^H buying new technology again?

Bob

--
Listen to my latest album here
Re:Don't concern yourself with this crap... by Peaker · 2004-11-11 09:21 · Score: 1

The google cache is a copyright breach?

If that is so, its because they make "unauthorized" copies of the web page by retransmitting it to anyone who wants it.

However, if you think about it - so does every goddamn router on the path the page goes through.

So maybe routers are the biggest breach of copyrights?
Re:Don't concern yourself with this crap... by Jahf · 2004-11-11 09:31 · Score: 2, Informative

IANAL but I would see this as falling under fair use.

1) the LoC is not profitting from your works nor is it re-using them (with the exception of providing an archive to others, see next item).

2) the LoC regularly tells people requesting copies of their information to first obtain permission from the copyright holder (in other words, as with any library, you can browse but you can't copy without permission and copy permission does not equal permission to reuse in a commercial work).

3) Copy protection schemes require active protection to fall under the DMCA, even if it is so simple that anyone can defeat it. Robots.txt is -passive- protection because you have to purposefully search for the file and then purposefully utilize it. To be active protection the document should not come up without the viewer (or blocked viewer) performing some form of action. When someone/something visits an unprotected public web page there is not a way for your web server to invoke the robots.txt file, therefore it is not an active mechanism.

--
It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
Re:Don't concern yourself with this crap... by ad0gg · 2004-11-11 09:33 · Score: 5, Informative

If don't want your site indexed or cached by google. Go here and follow the directions.
Remove yourself from google
"Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code. "

--
Have you ever been to a turkish prison?
Re:Don't concern yourself with this crap... by StikyPad · 2004-11-11 10:01 · Score: 1

One notable difference is that the Information Superhighway was invented by Al Gore.

And the other is a line from a commercial which used to be funny before everyone on /. found a way to use it in unremarkable ways.

--
https://www.eff.org/https-everywhere
Re:Don't concern yourself with this crap... by trogdor8667 · 2004-11-11 10:05 · Score: 1

My biggest problem is that I can't get any robots to obey my robot.txt file. I don't know if I'm coding them wrong or what, but I'll logon sometimes and see 20+ googlebots or inktomi spiders indexing my site, which affects my performance. If MS wants to steal Google's links, let Google deal with it. I want my site to appear on search engines, but the bandwidth I lose to Google and Inktomi spiders is not worthwhile to me.
Re:Don't concern yourself with this crap... by Asphalt · 2004-11-11 10:06 · Score: 1

I never put anything on a public web server that I wouldn't want showing up #1 on a Google Search.
Robots.txt or no robots.txt.
Re:Don't concern yourself with this crap... by eric76 · 2004-11-11 10:07 · Score: 1

Or just put meta tags in your pages like they said.

You shouldn't have to mark each individual page.
That would be like requiring authors/publishing houses to insert a full notice on every page of a book telling the reader that they do not have permission to copy that page.
Re:Don't concern yourself with this crap... by Trejkaz · 2004-11-11 10:16 · Score: 1

Sounds like Google needs an EULA. Remember, these things are legal contracts now. :-/

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:Don't concern yourself with this crap... by hunterx11 · 2004-11-11 10:17 · Score: 1

Google won't spider disallowed pages (and thus can't cache them), but it will index them nonetheless. When Google returns results with no description it usually means that the bot was excluded. It will only abstain from indexing if it spiders the page and finds a noindex meta tag (which ironically requires that it not be excluded by robots.txt).

--
English is easier said than done.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 10:41 · Score: 0

well - it might be because it's robots.txt, not robot.txt
Re:Don't concern yourself with this crap... by metalhed77 · 2004-11-11 11:07 · Score: 1

If you ignore robots.txt there's a damn good chance you are DoSing people's sites. That's what the problem is.

--
Photos.
Re:Don't concern yourself with this crap... by dshaw858 · 2004-11-11 11:19 · Score: 1

If you don't want people (or bots) viewing it then password protect it or take it off the public interweb.

I have to agree with this. In fact, for curious minds, robots.txt is truly a "LOOK HERE! LOOK HERE!" flag with a giant arrow pointing places. I often times check robot.txt's to find "secret" pages. Exploring sites this way is a lot of fun. As the poster I'm replying to said- if you want something private, protect it; don't just throw it onto the open web.

- dshaw
Re:Don't concern yourself with this crap... by mollymoo · 2004-11-11 11:41 · Score: 4, Insightful

If don't want your site indexed or cached by google. Go here and follow the directions.

I shouldn't need to go and fill out some form for every search engine to protect my rights. One accepted standard way to say "do not index this" should be sufficient. This is an automated system. There is an accepted automated method to stop crawlers indexing your site (robots.txt). If they (Google or anyone else) take your copyrighted content and reproduce it automatically when their automatic system could have automatically respected your explicitly stated and legally protected rights they are knowlingly making a flagrant copyright violation.

--
Chernobyl 'not a wildlife haven' - BBC News
Re:Don't concern yourself with this crap... by mollymoo · 2004-11-11 11:56 · Score: 1

Well anything on the internet that doesn't have normal web server access controls blocking access, is open slather IMO. That's what makes the internet so cool. Doesn't mean you can't still copyright your material so others can't use it, but I think for search engine purposes there is an implied agreement between YOU and THEM - and I think there should be.

Absolutely. Indexing small portions of any page available on the web, or placing them withing results based on their copyright-protected content is, IMO, fair-use. Making the full content available from a cache (Google-style) is not fair-use. It's copyright violation.
I don't mind Google doing it, I like them, but if Microsoft try that shit they will get a letter from me, then a letter from my lawyer, then...
I'm not a monopoly or even a business, thank god.
IANAL

--
Chernobyl 'not a wildlife haven' - BBC News
Re:Don't concern yourself with this crap... by _Qiang_ · 2004-11-11 12:01 · Score: 0

last time i tried that and my site indexed twice!!
Re:Don't concern yourself with this crap... by wondafucka · 2004-11-11 12:23 · Score: 1

Yo, it's like slang for tha infernet. 'Saight. We're all nubes at som' point dog.

--
postmodernsideshow.com
Re:Don't concern yourself with this crap... by buht · 2004-11-11 12:52 · Score: 1

Why the hell do you read google then, its none of your concern.

--

-- The box said Windows 2000 or better... so I installed Linux
Re:Don't concern yourself with this crap... by mollymoo · 2004-11-11 13:22 · Score: 1

The google cache is a copyright breach?
If that is so, its because they make "unauthorized" copies of the web page by retransmitting it to anyone who wants it.

Routers don't store the copies for more than fration of a second. Google stores them for weeks, months, years... Most routers don't do multicast (perhaps they can, but IPV4 still rules) so they don't retransmit it to "anyone who wants it" either. Typically they transmit a unique packet just once, then very soon forget the contents.

--
Chernobyl 'not a wildlife haven' - BBC News
Re:Don't concern yourself with this crap... by djcapelis · 2004-11-11 13:26 · Score: 3, Informative

To remove all the images on your site from our index, place the following robots.txt file in your server root:
User-agent: Googlebot-Image
Disallow: /

That should work? No?

--
I touch computers in naughty places
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 13:35 · Score: 1, Insightful

I don't think so.

They cached your page with whatever copyright notices you put on it.

What, are you going to tell me that when you put you zipped your website and put it in your p2p shared folder, and someone downloaded it, that they are commitying flagrant copyright violation?

IF YOU DON'T WANT PEOPLE TO SEE IT, DON'T PUT IT ON THE INTERNET NUMBSKULL.
Re:Don't concern yourself with this crap... by AstroDrabb · 2004-11-11 15:51 · Score: 1

IANAL either, however I don't see how your points are valid. For example, just change the LoC to Me and the content to the MPAA's library.
Imagine if my bot got to the MPAA's content and downloaded it all to my computer. I won't share this content without the permission of the MPAA, nor will I use it for any profit motive. Do you really think that the MPAA will allow me to just take all this copyrighted material for non-profit use for free? NO. So why would the LoC be allowed to take copyrighted material for non-profit use for free?
I personally think it is good that the LoC archieves everything it can. However, I just don't think your argument for the LoC makes sense especially if you change who is doing what as I did.
You may have a good point WRT your point #3, however IANAL so I cannot comment on whether passive vs. actice protection makes a difference WRT to the DMCA.

--
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
Re:Don't concern yourself with this crap... by AstroDrabb · 2004-11-11 15:57 · Score: 1

I wonder where caching proxy servers would fall WRT copyright violation then.

--
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
Re:Don't concern yourself with this crap... by KillerDeathRobot · 2004-11-11 19:10 · Score: 1

Google is only a copyright breach as much as a library is.

--
Thinkin' Lincoln - a web comic of presidential proportions
Re:Don't concern yourself with this crap... by weapon · 2004-11-11 19:14 · Score: 0

What about IE's Temporary Internet Files or Mozilla/Firefox's Cache? they can last for ages?

Weapon
Re:Don't concern yourself with this crap... by big_gibbon · 2004-11-11 21:16 · Score: 2, Insightful

It was rather annoying dealing with support via email from Googlebot as they have apparently taken on the stance of "we don't care but you should put meta tags in all your files so that we don't index those pages." Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.

Google should follow the robots.txt - definitely. But there needs to be some way on confirming on your website that you actually want the pages removing - otherwise what's to stop your competitors "accidentally" entering your URL into the removal form? Meta elements would seem to be the natural choice.

P
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 21:49 · Score: 0

Your "rights"? What "rights"? I've been writing HTML since '93 and I've always believed that if you don't want it copied, don't put it online - it's that simple. Every time anyone hits your site your server commits a copyright violation by making a copy of the requested file and sending it to the browser. If you have copyrighted material you don't want copied, then stick it behind a password control.

The WWW is for sharing. If you don't want to share, don't use it.
Re:Don't concern yourself with this crap... by Anonymous Coward · 2004-11-11 22:50 · Score: 0

Hmmm... let's call "robots.txt" a "mutant exploding stealth monkey". I'd bet an exploding stealth monkey or two would get them to pay attention...

Calling a thing by a name does not make it into the thing that that name describes, asswipe. ;o)

Is "robots.txt" a "device"? Probably not. More likely a "document" or "file". Before you go launching your DMCA suits I'd suggest you look carefully at the wording of the relevant legislation. UK law (CPDA 1998 - http://www.oup.com/uk/booksites/content/0199259445 /cdpact1988_final_v2.pdf) refers to "technical devices" at section 296 but doesn't seem to define "device". Launch your DMCA suits and lose and you are making matters worse, especially for you when you're countersued.
Re:Don't concern yourself with this crap... by ZenFu · 2004-11-12 00:12 · Score: 1

Since databases are currently copyrightable

Five years ago, when I studied IP, you could copy a directory. There was even a case in Kansas where one company copied another company's phone book and published the book as their own. Personally, I think the benefits outweight the costs.

Five years ago, however, there was a database protection act that was passed that did protect data - such as phonebooks. So, things have changed greatly in the last five years or you are talking about another country.
Re:Don't concern yourself with this crap... by ZenFu · 2004-11-12 00:15 · Score: 1

To clarify this, I was was talking about the database protection act that was passed in Europe, not in the US and five years ago, you could copy copy a phonebook in the US, just not in Europe. The main exception appearing to be some soft of sweat of the brow principle where in certain extreme cases, such as copying a CD instead of manually inputting the data, were disallowed in the US.

I am sure someone else knows more about it, I just thought it was worth discussing (as I am too lazy to look elsewhere for answers).
Re:Don't concern yourself with this crap... by iMMersE · 2004-11-12 04:08 · Score: 0, Troll

I'm going to save people a whole lot of time reading this story.

Firstly, from the article :

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds.

When I read the "LOL" bit, I realised it was my kid brother who had written the article.

And as for a quick summary of the hundreds of comments that follow :

"IANAL but ..."
"M$ are poo, Google are ace"
"M$ M$ M$ M$ ..."
"I haven't read the article, but here's my opinion ..."
"Google and MSN don't obey my robot.txt." "That's because it's robots.txt ..."

This post has been a free public service.

--
codegolf.com - smaller *is* better.
Re:Don't concern yourself with this crap... by Jahf · 2004-11-12 13:08 · Score: 1

The first difference is that I strongly doubt that the MPAA has all of their content available online without some form of active protection.

However if you change the "MPAA's content" to "any site in the world that has public information no matter what it is (let us say for instance, CNN) ...

Then the difference is a sad but true one ... that being that while your use would be just as fair (like recording TV on a VCR or DVR), the LoC has the pockets and legislative favors to withstand any attempt to attack the process while you don't.

But let's expand it further ... every site on the web that doesn't set a no-cache header (which I would stretch to say might be considered actively protected under the weak requirements of the DMCA) already let's you do EXACTLY that ... because all modern browsers cache data for offline retrieval.

Also, to clarify, go back and read my post ... I was not saying that the LoC (or you in this case) have the right to use the information for non-profit use for free. I was talking about keeping personal copies. In the case of the LoC remember that they won't even give you a -copy- of a copyrighted work unless you get permission from the (c) holder and then it is -still- not a license to -use- that information in anyway other than for personal fair use. A far cry from what your posit implies regarding MPAA content.

This is well established that recording/archiving of publicly available information is legal under fair use ... it is how the LoC archives things like congressional proceedings, white house press releases, and ANY broadcasted event or in this case freely and publicly available online information. The words/transcripts of such items may often be in the public domain, but the actual footage is the property of whoever shot it. Legally you can not re-use the footage of the Presidential Debates if you recorded them on your DVR because the footage is (c) by whichever network you recorded. You could be there in person to record your own copy and do whatever you wanted with it.

I can simplify the difference thusly ... you asked essentially why the LoC can take copyrighted material and -use- it for free non-profit use when you can't. My answer is, they can't (and neither can you), but you -can- archive it so long as you don't re-use it yourself (for instance, in creating a documentary) or give copies to others without explicit permission from the copyright holder.

And to further clarify, much my post was targetted at the idea of the LoC making copies of data on a server with a "robots.txt" rules file that supposedly disallowed it. I never said that going through a site to get protected content would be deemed legal under the DMCA, only that to violate the DMCA provisions you need active protection devices in place and "robots.txt" optional crawling rules are not active protection.

The point I was making about active versus passive protection was my language. The idea being that you can't violate protections (ie, reverse engineering, cracking encryption, scamming passwords) if those protections aren't enforced in the first place. There is no standard that says any form of HTTP client -must- use the "robots.txt" file if it exists and so just because the HTTP client is a crawler doesn't mean that it has to behave differently than any other HTTP client. If someone wants their site data to be protected under something like the DMCA, it needs to use something like passwords, encryption or both to force clients to go through the protection methods.

Obeying "robots.txt" is a courtesy. Yes, it is an -expected- courtesy like covering your mouth when you sneeze, but it is still a courtesy, not a lawful protection device.

--
It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
Re:Don't concern yourself with this crap... by trogdor8667 · 2004-11-18 18:09 · Score: 1

I checked to see if I'm honestlty that dumb, and yes, I did have everything right so far as I could tell...

Difficult to do if Google doesn't want them to by Anonymous Coward · 2004-11-11 07:37 · Score: 5, Insightful

All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.

Re:Difficult to do if Google doesn't want them to by carpe_noctem · 2004-11-11 07:41 · Score: 4, Funny

Why stop there? Google should just ban all of Microsoft's netblocks to prevent their employees from gathering useful information from them...

"Begun, this war of the corporations has!"

--
"Quoting famous computer scientists out of context is the root of all evil (or at least most of it) in programming." - K
Re:Difficult to do if Google doesn't want them to by Anonymous Coward · 2004-11-11 07:43 · Score: 2, Funny

Microsoft could create a new distributed crawler that comes bundled with Windows! Every Windows user could crawl Google for them, and then Google's only option would be to block everyone using an MS product.

Remember, helping Microsoft is like helping yourself.
Re:Difficult to do if Google doesn't want them to by superpulpsicle · 2004-11-11 07:56 · Score: 0, Troll

Hey it's not easy to block any gorilla, nevermind a trillion dollar one. Though again, Google should just block the word "windows" and "microsoft" at the javascript level on the main page.
Re:Difficult to do if Google doesn't want them to by blamanj · 2004-11-11 08:09 · Score: 5, Interesting

Yes, and don't think Google wouldn't notice. My company had a summer intern that once wrote a program that started sucking a lot of information out of Google. They blocked our entire site for about three days until everything got straightened out.
Re:Difficult to do if Google doesn't want them to by sipy · 2004-11-11 08:50 · Score: 0, Redundant

I don't think Google should block Microsoft's IP address ranges. If they did, how would Microsoft's employees get thier work done? You think they use search.msn.com? BAAAAAaaahhhh!
Re:Difficult to do if Google doesn't want them to by GreenKiwi · 2004-11-11 09:04 · Score: 1

even better... what google should do is provide erronious results to MS. This would cause MS to have collected tons of crappy data and have to filter through all of that.
Re:Difficult to do if Google doesn't want them to by zentigger · 2004-11-11 09:55 · Score: 3, Interesting

Better yet, Provide those addresses with the correct search results, but change all the links to the raunchiest porn (or pictures of little puppy dogs, if that better suits your sense of moral rectitude)

--
the above is my personal opinion and does not necessarily reflect that of the little voices in my head
Re:Difficult to do if Google doesn't want them to by KilobyteKnight · 2004-11-11 10:42 · Score: 1

All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.

But wouldn't it be much more fun if Google just detected MS IP addresses and returned bogus information to requests coming from them? Just imagine 75% of MS searches returning tubgirl or lemonparty.

--
When will Windows be ready for the desktop?
Re:Difficult to do if Google doesn't want them to by Anonymous Coward · 2004-11-11 11:07 · Score: 0

By the way, what happened to him? ;)
Re:Difficult to do if Google doesn't want them to by Anonymous Coward · 2004-11-11 12:07 · Score: 0

He now works in Microsoft's search division.
Re:Difficult to do if Google doesn't want them to by asavage · 2004-11-11 12:28 · Score: 3, Interesting

If you go to whatismyip you get a website that displays your IP address. If you search msn and google for that site the search results show the IP address of the bot that indexed that site.
For google I get: crawl-66-249-64-167.googlebot.com [66.249.64.167]
for msn I get: fj1011.inktomisearch.com [66.196.91.16]
and msn beta I get: 65.54.188.83 (can't find associated domain)
So we can tell that at least this result wasn't stolen from Google.
Re:Difficult to do if Google doesn't want them to by ViGe · 2004-11-11 19:25 · Score: 1

and msn beta I get: 65.54.188.83 (can't find associated domain)

So we can tell that at least this result wasn't stolen from Google.

No we can't, and if you had read the article, you would have known that. They don't steal the results directly from google, but instead they have their spider indexing links which are found from Google.

--
It has to work - rfc1925

You don't say! by xerocube · 2004-11-11 07:38 · Score: 1, Funny

You mean M$ is searching through somebody else's stuff? Well... I'll be damned...

Re:You don't say! by Ryan+Stortz · 2004-11-11 07:47 · Score: 2, Funny

Wasn't that the "plot" to the movie Anti-Trust?

--
Bugs are just features that have been fixed.
Re:You don't say! by Anonymous Coward · 2004-11-11 07:49 · Score: 1, Funny

They're not searching Google's porn links fast enough.
Re:You don't say! by cortana · 2004-11-11 08:16 · Score: 4, Funny

Movie? I thought that thing was a documentary!

Does it violate Google's Terms of Service by winkydink · 2004-11-11 07:38 · Score: 4, Insightful

If so, they have legal remedies.

If not, it's called doing business and gaining an advantage any legitimate way that you can.

I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.

--

"I'd rather be a lightning rod than a seismometer." -Ken Kesey

Re:Does it violate Google's Terms of Service by Lev13than · 2004-11-11 07:44 · Score: 3, Insightful

Does it violate Google's Terms of Service? If so, they have legal remedies.
If not, it's called doing business and gaining an advantage any legitimate way that you can.
I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.

If I copy your work and take credit or it, does it violate your terms of service? If so, you have legal remedies. If not, it's called doing business and gaining an advantage any legitimate way that I can.

Furthermore, I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.

--
When you have nothing left to burn you must set yourself on fire
Re:Does it violate Google's Terms of Service by winkydink · 2004-11-11 07:49 · Score: 1

In the case os a listing of pages on the internet, my guess is that it would be considered akin to the data in the phone book, which was recently ruled not subject to protection by copyright.
But, I am not a judge. Or a lawyer. And I expect that if Google litigated here, they would be setting precedent.

--
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
Re:Does it violate Google's Terms of Service by TheRaven64 · 2004-11-11 07:58 · Score: 4, Interesting

Do Google's terms of service have any legal standing? Click-through EULAs don't in many jurisdictions, and I don't remember ever even seeing Google's ToS, let alone agreeing to them.

--
I am TheRaven on Soylent News
Re:Does it violate Google's Terms of Service by nick13245 · 2004-11-11 08:22 · Score: 5, Informative

Yes it does.
From Googles Privacy Center (http://www.google.com/terms_of_service.html):

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.
Re:Does it violate Google's Terms of Service by _Sprocket_ · 2004-11-11 08:23 · Score: 1

The data isn't - but the collection is (formating, presentation, collation, etc.).
Re:Does it violate Google's Terms of Service by Cplus · 2004-11-11 08:55 · Score: 1

Hmmm, that's a lot like a click-through licence that no one ever had to click through. I can't imagine a licence such as this to be legally binding. A quick look around google.com didn't show me any licence, and I actually had to google search for it. Now I don't disagree that what is being done is wrong (if it's being done), but I don't think this is the legal leg to stands on. IANL ;)

--
"Share your knowledge. It's a way to achieve immortality." -- Dalai Lama
Re:Does it violate Google's Terms of Service by julesh · 2004-11-11 09:00 · Score: 1

I believe that under database right laws (covered on slashdot here, I don't know if it is actually a law yet) Google would have a right to prevent other people from using their index, if they want. Essentially, it would be a copyright-like right arising from the effort Google have put into building their index.

Does anyone know if this happened?
Re:Does it violate Google's Terms of Service by NuclearDog · 2004-11-11 09:26 · Score: 1

Here's how I look at it.

There is no law (AFAIK) saying that I must allow absoloutely everyone access to my server. If I wanted I could block your IP, people using Internet Explorer, people using Firefox, people requesting more than one page every 5 seconds, people coming from the country of America... whatever the hell I want.

Now, I can just say "If you want to use my site, you must follow MY rules." My rules being the terms of service. If you don't follow my rules, I can go ahead and firewall off your IP, return a page saying you've been blocked, or even just return fake results.

This is the same as if I went into Barnes & Noble and moved all the bibles to the fiction section. They are perfectly within the law to kick me out and forbid me from ever coming back. It is their private property and they are not required by law to allow me access just as my server is my private property and I am not required by law to allow you access.

ND

--
This statement is forty-five characters long.
Re:Does it violate Google's Terms of Service by modecx · 2004-11-11 09:28 · Score: 1

That's the thing. It's not a license. It's their terms of service.

If you violate their terms of service, they have the right to not serve you. Heck, they don't have to serve you even if you follow the rules exactly.

Much like the signs on private businesses: "no shirt, no shoes, no service". A business can refuse service pretty much on any basis. They don't like the way you look, and won't sell you what you want? Tough luck. Same deal here.

--
Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
Re:Does it violate Google's Terms of Service by RmanB17499 · 2004-11-11 11:14 · Score: 1

I disagree over legal remedies. Since Google's search is offered for free - with no consideration - then the Terms of Service can't be construed to be a contract. A contract requires offer, acceptance, and consideration to be legally enforced. We can agree on anything - "Hey let's meet for lunch, at noon, by the water fountain." And there's nothing I can do if you don't show up...unless I pay you, or offer something, then you would be in breach.
Re:Does it violate Google's Terms of Service by BillyBlaze · 2004-11-11 11:44 · Score: 1

Well, even if the terms of service have no legal standing, you still can't meta-search google, because that would probably be copyright infringement and would almost certainly run afoul of "database rights" (shudder). You need a contract with them that explicitly allows that.
Re:Does it violate Google's Terms of Service by mce · 2004-11-11 11:59 · Score: 1

All they need is the data. They're perfectly capable of producing their ofn presentation and all that.

--
Linux user since early January 1992.
Re:Does it violate Google's Terms of Service by geg81 · 2004-11-11 13:49 · Score: 1

That may not be an argument Google wants to make. A lot of sites on the Internet have their own terms of service that prohibit some of what Google is doing with their data (like copying it over into their own databases and displaying it from their cache). If Google starts making arguments that TOS are enforceable, they may seem them enforced against them, and they have a lot more to lose in the process.
Re:Does it violate Google's Terms of Service by winkydink · 2004-11-11 14:21 · Score: 1

Interesting perspective. I would have never thought of that.

--
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
Re:Does it violate Google's Terms of Service by _Sprocket_ · 2004-11-11 15:23 · Score: 1

The address www.foo.bar is the public data. The fact that it might have something to do with a search for "widgets" is not.
Re:Does it violate Google's Terms of Service by Minna+Kirai · 2004-11-11 20:07 · Score: 1

If you don't follow my rules, I can go ahead and firewall off your IP,

Irrelevant. Even if someone does follow the rules, the webserver can ban him on a whim. The rules mean nothing (except to give a warning).

In fact, the rules don't even make sense: "If you want to make commercial use of the Google Services,"

So a company worker looking for a place to buy polyesther thread can't search for it on Google, because that'd be a commercial use!
Re:Does it violate Google's Terms of Service by Minna+Kirai · 2004-11-11 20:10 · Score: 1

They don't like the way you look, and won't sell you what you want? Tough luck.

And then you sue them for discrimination, and the feds seize their entire business assests.

Business owners in the USA do not have the right to refuse service for any reason they want!
Re:Does it violate Google's Terms of Service by Minna+Kirai · 2004-11-11 20:15 · Score: 1

If I copy your work and take credit or it,

Regardless of any terms of service, that's doubly illegal: it is both copyright infringement, and fraud.

Even if the author gave you permission, taking credit for it yourself is plagiarism, which in certain instances can make you liable for fraud.
Re:Does it violate Google's Terms of Service by Minna+Kirai · 2004-11-11 20:18 · Score: 1

because that would probably be copyright infringement

No. Google is not the author of the search results. It merely quoted them from other sources, and holds no copyright on them.

would almost certainly run afoul of "database rights" (shudder).

Both Google and Microsoft are in the USA, where there is no legal concept of "database rights". They both have European subsidiaries, but it's unlikely MS is using them for a search project such as this.
Re:Does it violate Google's Terms of Service by NuclearDog · 2004-11-11 21:11 · Score: 1

My general point was, the server is my private property and all the Terms of Service are there for is, like you say, a warning and for an outline as to what I will and wont let you do and possibly what I plan to do in the event that you don't follow my rules.

The reason I was trying to point this out, was because people were comparing the ToS to a EULA as far as enforcability, and I was attempting to state that this is not a fair comparison, as the ToS doesn't actually require any legal backing at all.

"So a company worker looking for a place to buy polyesther thread can't search for it on Google, because that'd be a commercial use!"

Yeah, that is one interpretation.

ND

--
This statement is forty-five characters long.
Re:Does it violate Google's Terms of Service by mikechant · 2004-11-11 22:40 · Score: 1

There is no general anti-discrimination law in the US, the UK or (AFAIK) any other country. They can discriminate on *any* grounds they like, including 'looks' as long as that doesn't amount to a specific prohibited form of discrimination such as race or disability.
Re:Does it violate Google's Terms of Service by Anonymous Coward · 2004-11-12 03:39 · Score: 0

They DO have the right. You fail it.
Re:Does it violate Google's Terms of Service by OhHellWithIt · 2004-11-12 04:23 · Score: 1

I submit that it doesn't matter. The same logic that says Google has a right to index my website because it's not password-protected would seem to indicate that I have a right to index Google's website.
Intellectual property issues sure are peculiar!

--
"Who controls the past controls the future. Who controls the present controls the past." -- George Orwell
Re:Does it violate Google's Terms of Service by Minna+Kirai · 2004-11-12 06:52 · Score: 1

amount to a specific prohibited form of discrimination such as race or disability.

And since those things ARE specifically prohibited, the statement "They can discriminate on *any* grounds they like" is 100% untrue.

Yea, and by BrianGa · 2004-11-11 07:38 · Score: 5, Funny

The new search engine's name will be Mooglesoft.

Re:Yea, and by MooseByte · 2004-11-11 07:46 · Score: 4, Funny

"The new search engine's name will be Mooglesoft."
Which will subsequently be sued by SCOogle, the latest startup from The Canopy Group, after announcing they purchased the rights to the Internet in a complex transaction which is documented in a briefcase somewhere in Germany.
Re:Yea, and by meabolex · 2004-11-11 07:48 · Score: 3, Funny

Initiating a Mooglesoft search:

Instead of clicking a button named Google Search, it simply says "KupoKupo!"

You are then returned a page where 100% of the text is the word "Kupo"

This is slightly less optimized than a Marklar search (which at least has some words other than 'Marklar').

--
FORTUNE FAVORS IRONY
Re:Yea, and by arootbeer · 2004-11-11 08:40 · Score: 0

Wait...wouldn't a meaningful result that consisted of only one term be MORE optimized than a meaningful result that had to be expressed using more than one term?
Imagine the savings in vocalization hardware!
Re:Yea, and by Carewolf · 2004-11-11 08:58 · Score: 1

Well since Microsoft has patented TCP/IP it would be obvious who they bought the Internet from.
Re:Yea, and by happyfrogcow · 2004-11-11 09:35 · Score: 1

The briefcase being held by a Nihilist saying, "Ja, veee steel wont ze focking mooney, Labowsky!" Not knowing the briefcase is actually filled with dirty laundry.
Re:Yea, and by FyRE666 · 2004-11-11 09:38 · Score: 1

Interestingly, a search for Litigious bastards on Microsoft's search engine: Like this brings up our favorite scoundrels at #1! Spooky, huh? ;-)

--
Code, Hardware, stuff like that.

What do you expect? by r2q2 · 2004-11-11 07:39 · Score: 1

Really, since the google search results are public knowledge why wouldn't microsoft crawl google's stuff? If msn search can crawl the web why should it limit itself to everything except google/yahoo? Although this tactic may work the importation of all of googles massive search database might take awhile.

--
My UID is prime is yours?

But will this mean Google can crawl back? by biffnix · 2004-11-11 07:39 · Score: 5, Funny

Couldn't Google just crawl Microsoft in return? Then they'd be stuck in an endless loop, and William Shatner can then swoop in, crack some skulls, and save the day.

Or something like that.

biffnix

--
Don't Die Wondering

Re:But will this mean Google can crawl back? by Anonymous Coward · 2004-11-11 07:59 · Score: 0, Insightful

Google would gain nothing from crawling Microsoft. All they'd be getting is their own material.

If Microsoft is indeed doing this however, they could become real competition a lot sooner than you'd think.

Why not? by Anonymous Coward · 2004-11-11 07:39 · Score: 1, Insightful

Doesn't that mean even more results?

I'd do the same thing if I could. This is all "speculation" anyway, but since it feeds the stereotype of the insidious Microsoft, it gets posted front page to this "tech news" site.

Microsoft stealing someone elses technology??? by Shant3030 · 2004-11-11 07:39 · Score: 4, Funny

Nah, never happens....

--
100% Insightful

Re:Microsoft stealing someone elses technology??? by Picard102 · 2004-11-11 07:52 · Score: 1

I fail to see how they are stealing any of Google's technology. Data maybe.
Re:Microsoft stealing someone elses technology??? by isometrick · 2004-11-11 07:56 · Score: 3, Interesting

Google's "data" is collected, generated, and stored by their technology.

I won't steal your oven, but I'll steal your food!
Re:Microsoft stealing someone elses technology??? by Picard102 · 2004-11-11 07:59 · Score: 1

Data that is public to anyone who enters a search.
Re:Microsoft stealing someone elses technology??? by isometrick · 2004-11-11 08:03 · Score: 2, Informative

Google Terms of Service

" ... You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance ..."
Re:Microsoft stealing someone elses technology??? by netringer · 2004-11-11 08:11 · Score: 4, Interesting

I fail to see how they are stealing any of Google's technology. Data maybe.
Are are they stealing Google's innovations?

Lo! Note how the review articles of the last few days mention the innovative NEW FEATURE of MSN search called, "Search Near Me" which stores the calculated lat/long of addresses on web pages and returns matches near you.

Note how Google's long in beta Google Local (http://local.google.com) stores the calculated lat/long of addresses on pages and returns matches near you. Google Local works better.

Another Microsoft innovation! Let's hope WE remember who had it first!

--
Ever dream you could fly? Get up from the Flight Sim. I Fly

Legallity by Anonymous Coward · 2004-11-11 07:40 · Score: 0

Surely this is just as illegal as going through someone else trash, it is still not your property...

I guess that is the point

Re:Legallity by La+Camiseta · 2004-11-11 07:50 · Score: 1

I wouldn't go that far. As soon as you put your trash on the curb, it becomes public property, and anyone can go through it. It's when they go through trash and it's in your side yard that it's illegal.

That's why it's technically illegal to go dumpster diving in dumpsters that are enclosed in those little brick cubes behind buildings. Although I've never really had a problem with them while dumpster diving. They can sure as hell, and probably would, get you for dumping your trash there.
Re:Legallity by tomhudson · 2004-11-11 10:04 · Score: 1

I wouldn't go that far. As soon as you put your trash on the curb, it becomes public property, and anyone can go through it.
Wrong. Once it's on the sidewalk, it is on city property, and is subject to the conditions imposed by the municipality.
Among other things, one of those conditions is that only the contractor hired to pick up the garbage may do so. This was decided when recycling started taking off, and municipalities began to see revenue streams from recycling newspapers, cardboard, etc., and using municipal wastes to generate energy in cogeneration plants.
Other "gypsy" recyclers would go through people's trash before the pickup time and steal the recyclable material and sell it off.
The courts held that it was theft. Since it was theft, it was also an invasion of privacy, as 3rd parties had no legal right to it, any more than they would have a right to take your bicycle if you park it on the sidewalk, or enter your home if you left the door unlocked

They been crawling like mad lately by mpost4 · 2004-11-11 07:40 · Score: 5, Interesting

I can say that they been crawling like mad as of late, Google, Yahoo, and MSN. I say this because on my site I have had a lot of traffic from all three, and my site is not a popular, or even an important one but I seen a lot of traffic from them. Not just once a week or a few times a week but every day. There are big updates coming. I was not surprised to see the article about google doubling their index, I know something was coming from the way they are crawling unimportant/unpopular sites.

Re:They been crawling like mad lately by Eric+Giguere · 2004-11-11 08:02 · Score: 1
Definitely lots of crawling. From my logs, I see that:
- GoogleBot crawls me extensively daily
- Slurp (Yahoo!) does it daily but only a few pages
- msnbot is like slurp
- Exabot does an extensive one every few days (this is fairly new for me)
And of course I get the occasional random crawl from some other bot I've never heard of. But Google is by far the most consistent and the most extensive.
Eric
Why the Vioxx recall reduced spam (humor)
Re:They been crawling like mad lately by geoffspear · 2004-11-11 08:13 · Score: 1

Ever think they might all be following the 2 links you have to your site in every slashdot comment you post?

--
Don't blame me; I'm never given mod points.
Re:They been crawling like mad lately by mpost4 · 2004-11-11 08:14 · Score: 1

I have not seen Exabot but I have also seen
Agent: RssReader/1.0.88.0 (http://www.rssreader.com) Microsoft Windows NT 5.1.2600.0
Syndic8/1.0 (http://www.syndic8.com/)
Agent: CoralWebPrx/0.1 (See http://www.scs.cs.nyu.edu/coral/)
Agent: SharpReader/0.9.5.1 (.NET CLR 1.1.4322.2032; WinNT 5.0.2195.0) not sure if this is a bot or a rss reader, I am tempted to think it is a rss reader becauase the next agent from the IP is Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
Re:They been crawling like mad lately by mpost4 · 2004-11-11 08:17 · Score: 1

all the links on Slashdot from my comments are to the same address, I would think that they would remember a history of what they crawled and when they see them again just though out the URL with out crawling and mark another link to that URL, but not follow it.
Re:They been crawling like mad lately by Eric+Giguere · 2004-11-11 08:19 · Score: 1

And I should point out that my site is small potatoes, so it's interesting that I get crawled so much. I only get about 500-600 page impressions on a normal day, although the other day the Firefox release caused my How to detect Firefox page (since reworked as How to detect Internet Explorer to be more politically correct) site to get over 8000 hits. But that's unusual.... if only I could make it a daily event!

Anyhow, I think Google is actively looking for and indexing small sites. Probably makes sense. Economists always say that small businesses as the engines that run the economy, maybe the analogy applies to small websites as well.
Eric
JavaScript is not Java
Re:They been crawling like mad lately by Anonymous Coward · 2004-11-11 08:22 · Score: 0

Syndic8 is an index of RSS feeds.

Coral is a distributed proxy system.

Sharpreader is an RSS reader.
Re:They been crawling like mad lately by mpost4 · 2004-11-11 08:26 · Score: 1

Possibly, but when I think of my site, it is only a vanity site. Your site does provide some real information that looks (well to me) like it is useful information. 8000 hits in one day, WOW. This month I have hit 300Mb of transfer and for me that is a lot, I am not counting last month when my web site got /.ed for a 8Mb file*, but I would say that that would be an statistical outlier for every web site.

*for those of you that might be interested, it blow though 52 GB in one day.
Re:They been crawling like mad lately by maxume · 2004-11-11 08:31 · Score: 1

try a search.
http://www.sharpreader.net/
Can't imagine what else it might be.

--
Nerd rage is the funniest rage.
Re:They been crawling like mad lately by kevlar · 2004-11-11 08:56 · Score: 1

Then they wouldn't be indexing accurately. All that would do is insure that they have stale data on your site.
Re:They been crawling like mad lately by ad0gg · 2004-11-11 09:38 · Score: 1

Yahoo has been spidering my site almost everyday, yet the resuls/cache from their website is two months behind. Not sure what they doing. On the SEO forums, I heard some people where getting crawled so much that they sending out a gig worth of traffic everday to one of the major search engines. Last year about this time google released the florida update which was a big change in their algorythm. But I think this time its going to be yahoo.

--
Have you ever been to a turkish prison?
Re:They been crawling like mad lately by node+3 · 2004-11-11 12:20 · Score: 1

I say this because on my site I have had a lot of traffic from all three, and my site is not a popular, or even an important one but I seen a lot of traffic from them. Not just once a week or a few times a week but every day. There are big updates coming.

And so it begins...

OK, what begins I don't know. It just sounded dramatic.
Re:They been crawling like mad lately by Anonymous Coward · 2004-11-12 11:25 · Score: 0

Hehe.

As long as it's legal by arbi · 2004-11-11 07:40 · Score: 1

As long as it's legal and helps Microsoft, I highly doubt that Microsoft would be concerned about the ethics of doing such a thing. The author is probably right.

Re:As long as it's legal by Anonymous Coward · 2004-11-11 07:41 · Score: 0

It's not legal. It violates Google's EULA (no automated crawling), and we know MS loves their EULAs.
Re:As long as it's legal by Usquebaugh · 2004-11-11 07:56 · Score: 1

Occams razor

As long as it helps Microsoft, I highly doubt that Microsoft would be concerned about the ethics of doing such a thing.

Not a very effective tactic by mg2 · 2004-11-11 07:40 · Score: 0

I think Google could have some fun if MS was indeed just screen scraping... I don't think it would be too far fetched to alter results for a certain Microsoft-operated IP.

Try this term on MSN search by bbzzdd · 2004-11-11 07:40 · Score: 5, Funny

more evil than satan

ROOFLES!

Re:Try this term on MSN search by fimbulvetr · 2004-11-11 07:42 · Score: 1

lol, good show
Re:Try this term on MSN search by JohnnyKlunk · 2004-11-11 07:47 · Score: 5, Funny

OK. This is really freaky. Try

more evil than god and you get FIREFOX as the first result (then google, of course)
Re:Try this term on MSN search by finkployd · 2004-11-11 07:49 · Score: 4, Funny

That they put google up there as the number one search result is not that surprising. What gets me is they have themselves at number four.
Re:Try this term on MSN search by DoctorHoe · 2004-11-11 07:50 · Score: 1

You can also try this phrase: more evil than microsoft
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 07:50 · Score: 1

can someone explain this???
Re:Try this term on MSN search by erichome · 2004-11-11 07:51 · Score: 1

Note the number 4 position.. I'm glad they are being honest about it.
Re:Try this term on MSN search by fireshipjohn · 2004-11-11 07:51 · Score: 2, Informative

Now try it on google and you get articles about the 'more evil that....' debate.

I know which search engine I'm sticking with :)
Re:Try this term on MSN search by Kalak451 · 2004-11-11 07:52 · Score: 2, Interesting

Also note that the "SPONSORED SITES" part of the page goes away on that search.
Re:Try this term on MSN search by hehman · 2004-11-11 07:54 · Score: 2, Interesting

I think you meant this URL: more evil than microsoft
Re:Try this term on MSN search by LiquidCoooled · 2004-11-11 07:54 · Score: 1

Firefox is also item 11 on a generic search for evil!!!

MS really don't like competition.

--
liqbase :: faster than paper
Re:Try this term on MSN search by }InFuZeD{ · 2004-11-11 07:55 · Score: 2, Funny

I'm not sure if it's funnier that Google is #1, or that Microsoft lists itself as #4.
Re:Try this term on MSN search by Picard102 · 2004-11-11 07:56 · Score: 1

http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/ Google has played a simmilar game.
Re:Try this term on MSN search by Garion+Maki · 2004-11-11 07:57 · Score: 3, Informative

pritty funny :)

but it seems like google started it several years ago.

http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/
and
http://searchenginewatch.com/sereport/article.php/ 2167621
btw, it doesen't seem to work on google anymore...

--
All indicators show that the human race is selectively breeding itself for stupidity.
Re:Try this term on MSN search by stratjakt · 2004-11-11 07:57 · Score: 2, Funny

Realize it takes into account popularity of the site, and occurence of the words, and I believe thw word types are ranked too, nouns before verbs before adjectives before adverbs.

The Firefox page is fairly popular, and the words "more" and "than" appear over and over, as with Google. (Uh, googles motto "do no evil" wouldn't hit another word, hmmmmmmmmm)

Try this one (seriously): more gay than slashdot

--
I don't need no instructions to know how to rock!!!!
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 07:59 · Score: 0

http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/

Is it so hard to put a URL inside a link tag?
Re:Try this term on MSN search by cannon+fodder+0109 · 2004-11-11 08:00 · Score: 1

Some time ago google used to put microsoft.com at the top of the results if you searched for "more evil than satan himself". This is not a new phenomenon.

--
Pick up the bread knife and carve your way into forensic history
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 08:03 · Score: 0

Google put it up because that was it's measure of what web users thought. It was all done by unthinking software.

These MSN results aren't.
Re:Try this term on MSN search by DoctorHoe · 2004-11-11 08:05 · Score: 1

Yup that is what I meant: more evil than microsoft Thanks
Re:Try this term on MSN search by ShadeARG · 2004-11-11 08:08 · Score: 1

What more, litigous bastards brings up The SCO Group! Hmm.. where have we seen this before..
Re:Try this term on MSN search by fupeg · 2004-11-11 08:15 · Score: 1

That should be litigious bastards and check out the RIAA at #8.
Re:Try this term on MSN search by geoffspear · 2004-11-11 08:18 · Score: 1

I didn't know slashdot would allow us to change the head of the documents it served. Most browsers ignore non-stylesheet link tags anyway, so it would be pointless.

--
Don't blame me; I'm never given mod points.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 08:18 · Score: 0

I did a quick search for Internet Explorer, and after the first couple links to Microsoft's site, it came up with a link to an article titled "why you should dump internet explorer" and followed by endless security warnings.

If microsoft continues to be this unbiased, they may actually have a good product on their hands.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 08:20 · Score: 0

more importantly, if you search for 'litigious bastards', you still get SCO....
Re:Try this term on MSN search by ShadeARG · 2004-11-11 08:21 · Score: 1

That's interesting as well. What's also odd is that litigous itself brought up The SCO Group.. I guess their trying to cover all bases in case someone like me spells it wrong. ;-)
Re:Try this term on MSN search by Enigma_Man · 2004-11-11 08:22 · Score: 1

Neither of these are as fun as "more evil than poop" but I guess they can't all be cool.

-Jesse

--
Nothing says "unprofessional job" like wrinkles in your duct tape.
Re:Try this term on MSN search by xutopia · 2004-11-11 08:24 · Score: 1

Better than god ;)
Re:Try this term on MSN search by alx.slashdot · 2004-11-11 08:26 · Score: 1

At least M$ is there too. They rank 4.
Re:Try this term on MSN search by whyne · 2004-11-11 08:27 · Score: 1

The 5th result nice

"more evil than satan himself" [msn.com]
Re:Try this term on MSN search by Red+Alastor · 2004-11-11 08:30 · Score: 4, Interesting

Sure. Bill Gates is an atheist so he think that God is evil. Open Source too, specially that pesky browser eating his market share.
Before you mod me down for that, I'd like to mention that this isn't Microsoft bashing since I am an atheist too and so are Linus and RMS.

--
Slashdot anagrams to "Sad Sloth"
Re:Try this term on MSN search by starrsoft · 2004-11-11 08:50 · Score: 1

What's even funnier than MS putting Google at the top, is the fact that Microsoft Corporation is fourth!

--
Read my blog: HansMast.com
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 09:01 · Score: 0

no its just the most pupolar site with MORE THAN in its content.

dumbfook
Re:Try this term on MSN search by Jugalator · 2004-11-11 09:01 · Score: 1

Wow, Microsoft must take the prize as having the slowest search engine in internet history. I had to cancel since I didn't bother to wait any longer. Wow...

--
Beware: In C++, your friends can see your privates!
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 09:01 · Score: 0

Service Unavailable
The server is temporarily unable to service your request. Please try again later.

hahahaha
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 09:05 · Score: 0

http://beta.search.msn.com/results.aspx?q=linux&FO RM=QBRE

not that many results ...
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 09:19 · Score: 0

Try " more evil corporation " Sorry, don't know the html codes for a link.
Re:Try this term on MSN search by sucati · 2004-11-11 09:26 · Score: 1

Who is more evil than Bill Gates you ask? Well apparently philip greenspun.
Re:Try this term on MSN search by mormop · 2004-11-11 09:33 · Score: 4, Funny

It's not so much so much the result that scares me as the thought processes that led you to try it ;)

--
Hmmmmmm..... Deep fried and look like Squirrel.
Re:Try this term on MSN search by finkployd · 2004-11-11 09:37 · Score: 1

no its just the most pupolar site with MORE THAN in its content.

What does it take to make a pupolar site?
Re:Try this term on MSN search by KFury · 2004-11-11 09:55 · Score: 3, Interesting

That they put google up there as the number one search result is not that surprising. What gets me is they have themselves at number four.

Not anymore. They apparently hand-edited their own company out of the results about an hour ago.

--

Kevin Fox
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 10:08 · Score: 0

The slowest search engine I saw was BIRD. Unfortunately, it is down now. It used to do a "more like this" on URL you provided. It was so slow it sent results by email. The results were amazing though.
Re:Try this term on MSN search by StikyPad · 2004-11-11 10:19 · Score: 3, Informative

His thought process probably started here

--
https://www.eff.org/https-everywhere
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 10:24 · Score: 0

If you do the complete term, you get a hit to Google. That's not accidental.
Re:Try this term on MSN search by pclminion · 2004-11-11 10:33 · Score: 1

I think a likely theory is that he was trying to search for "More evil than good" (which is quite a bit more plausible) and simply made a typo.
Re:Try this term on MSN search by darth_silliarse · 2004-11-11 10:38 · Score: 1

try "bill gates is a cunt" on msn.com and most of the links are slashdot :D

--
I've noticed that everyone who is for abortion has already been born - Ronald Reagan
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 11:23 · Score: 0

If you search for "worst president" whitehouse.gov comes up as #2
Re:Try this term on MSN search by jelle · 2004-11-11 11:33 · Score: 1

more evil sort of makes up for it?

--
--- Hindsight is 20/20, but walking backwards is not the answer.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 11:47 · Score: 0

I guess lucky number 7 wouldn't be so for 'microsoft sucks'

They even admitted it themselves!!! Takes them right to MSN.com!!!
Re:Try this term on MSN search by perlionex · 2004-11-11 12:11 · Score: 1

What about just plain more evil? =)

--
Gan Family Homepage
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 12:11 · Score: 0

just try

more evil

Microsoft knows itself well.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 12:14 · Score: 0

Sincerely a website that returns as its first search result a site featuring only half of the words in my search is worthless to me. I don't care how popular a site is if is it completly irrelevant to my search. Besides you explanation about word ranking doesn't explain the results: the noun "god" is ignored for "more" and "than"
Re:Try this term on MSN search by mmortal03 · 2004-11-11 12:37 · Score: 1

From what I've heard, everything is more evil than God.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 12:44 · Score: 0

Try just more evil, Microsoft is #1.
Re:Try this term on MSN search by iwrasahp · 2004-11-11 14:19 · Score: 1

Good old M$ will never settle for number four
Re:Try this term on MSN search by gad_zuki! · 2004-11-11 15:49 · Score: 1

>so he think that God is evil

I know this is a joke, but it plays right in the hands of the religious people I meet who think atheism is a fancy word for "satan-worship." Their little theistic worldview can't handle the fact that millions don't buy modern day fairy tales about good and evil and creator gods, etc.

If there's "evil" in the world its in the form of the anti-reason, pro-faith groups that dominate the world, especially the US.
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 16:59 · Score: 0

http://beta.search.msn.com/results.aspx?q=evil+cop eration&FORM=QBRE
and see who comes up
Re:Try this term on MSN search by jayrod422 · 2004-11-11 18:25 · Score: 1

Most Evil

--
Hard Work Often Pays Off After Time, but Laziness Always Pays Off Now.
Re:Try this term on MSN search by DonGar · 2004-11-11 18:43 · Score: 1

Better yet, try more evil

--
plus-good, double-plus-good
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 19:09 · Score: 0

Try:

complete lie

liar and thief

worse than hell

bloated software

this site sucks

Anyway it's pretty obvious that the MS search engine needs some HEAVY tweaking. I guess they thought they could catch up to goole in a matter of months by throwing lots of money at the problem. Go figure.
Re:Try this term on MSN search by mrchaotica · 2004-11-11 20:25 · Score: 1

Huh? I just tried it, and it's still there -- in fact, I hadn't read about it yet, and discovered it for myself.

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Try this term on MSN search by JohnnyKlunk · 2004-11-11 20:43 · Score: 1

heh. Just trying the opposite of "more evil than satan". Was REALLY surprised to see Firefox there.
I'm not sure I'd like to debate the theological implications of any such premise.
Re:Try this term on MSN search by dr_d_19 · 2004-11-11 21:33 · Score: 1

Reminds of the old easter egg in Microsoft Excel which displayed "the programmers hell" in which as Excel developers resided :)
Re:Try this term on MSN search by Anonymous Coward · 2004-11-11 23:26 · Score: 0

This is called revenge.

http://blog.searchenginewatch.com/blog/041111-1544 33

The Return Of More Evil Than Satan

Way back in 1999, one of the first famous Google blips came up. A search for more evil than satan found Microsoft's home page as number one in Google. My past article, More Evil Than Dr. Evil, looks at the situation more.

That's no longer the case on Google -- but something similar has happened on the new MSN Search site. In fair turnabout, a search for more evil than satan there find Google as number one. Thanks for the tip via Kevin Fox.

For the record, more evil at satan at Yahoo doesn't bring up either Microsoft or Google. Articles listed do declare that Microsoft has purchased evil from Satan, however -- and that PowerPoint Is Evil.
Re:Try this term on MSN search by Blowfishie · 2004-11-11 23:45 · Score: 1

I prefer the results from "rubbish search".
Re:Try this term on MSN search by Red+Alastor · 2004-11-12 02:09 · Score: 1

I know this is a joke, but it plays right in the hands of the religious people I meet who think atheism is a fancy word for "satan-worship." Their little theistic worldview can't handle the fact that millions don't buy modern day fairy tales about good and evil and creator gods, etc.

If there's "evil" in the world its in the form of the anti-reason, pro-faith groups that dominate the world, especially the US.

Indeed. An atheist can't really think that God is evil because he don't exist. It's like saying that Bigfoot is evil, it makes no sense. You can think however that the concept is evil.

I personally don't think that there is "evil" in the world because I don't think people decide to do stuff just to be evil. They can be selfish, like moral but they don't try to do things because it is the evil thing to do.

--
Slashdot anagrams to "Sad Sloth"

They wouldn't... by Wrathie · 2004-11-11 07:40 · Score: 4, Funny

Such trouble. Just buy the damned company.

Re:They wouldn't... by RobertB-DC · 2004-11-11 07:45 · Score: 4, Funny

Such trouble. Just buy the damned company.

Come on, be serious. Google doesn't plan to buy Microsoft until *after* they reach the one-year post-IPO mark, silly.

--
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
Re:They wouldn't... by stinkyfingers · 2004-11-11 07:53 · Score: 1

Why buy the company when they could just steal all the companies "sssets", then put them out of business.

Think ruthlessly.
Re:They wouldn't... by jessecurry · 2004-11-11 07:57 · Score: 1

Why doesn't M$ just use google's search technology, it's highly doubtful that M$ will ever be able to create a better search engine than google(not that google is perfect, but M$ will probably try to add some strange integration with Windows openning up a huge security flaw and crippling home PCs for years to come).

--
Those who know, do not speak. Those who speak, do not know. ~Lao Tzu
Re:They wouldn't... by Anonymous Coward · 2004-11-11 08:17 · Score: 0

"I didn't get this rich writing checks to people!
Buy him out boys!"
Re:They wouldn't... by geoffspear · 2004-11-11 08:21 · Score: 1

Umm, the same reason they wrote IE instead of using Netscape's technology? They don't care if they can create a better product; it's enough to make a product people will use because it's integrated into their OS.

--
Don't blame me; I'm never given mod points.

Shocked I tell you by finkployd · 2004-11-11 07:41 · Score: 5, Funny

Well, that kind of business practice would be completely out of character for Microsoft.

This is a non-story. A good Slashdot headline will be when they get caught actually NOT doing something like this.

Microsoft Has Original Idea and Implements it By Themselves
From the 70%-of-slashdot-editors-suffered-heart-attacks -reading-this-submission Dept.

Re:Shocked I tell you by oGMo · 2004-11-11 07:51 · Score: 2, Funny

Microsoft releases "Bob"
From the laugh-it's-funny Dept.

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:Shocked I tell you by Logicdisorder · 2004-11-11 08:18 · Score: 1

I have to say I am shocked! Its not like they ripped DOS off someone else, oh what they did and I am pretty sure NTFS was all there own. Oh hold on there a mo can we say HPFS kids :) And then there is .Net that is all MS, look sorry guys I am also wrong about that. They hired the guy from Borland who made Delphi. Silly me

But on the filp side of that coin is if it make there shit better then and as long as they are not breaking the law I have no realy problem with it. I will never use MSN but if by ripping off Google they supply a good product to the people that use the MSN search, Props

--
"The most dangerous creation of any society is that man who has nothing to lose." - James Baldwin, American author
Re:Shocked I tell you by finkployd · 2004-11-11 08:28 · Score: 1

And then there is Active Driectory....oops that is just a ripped off DCE. What about DCOM? Oh that is DCERPC minus the security.
Re:Shocked I tell you by Anonymous Coward · 2004-11-11 11:04 · Score: 0

I don't think so - none of the editors would suffer heart attacks. They would just mod it down to -2 (Troll), like all the other pro-MS posts. Maybe if /. were remotely interested in balanced reporting...

More lies from garcia by Anonymous Coward · 2004-11-11 07:42 · Score: 0

Google nor MSN "profit" from crawling your site, garcia.

Re:More lies from garcia by calibanDNS · 2004-11-11 07:47 · Score: 2, Insightful

Actually, search engines profit from ad revenue displayed on search result pages (amoung other things). The search engine with the best results SHOULD attract the most users. Increasing the number of users can correlate to increasing profits from ads. Thus, search engine sites profit from having THEIR 'bots crawl YOUR site. On the flip side, we as web users, profit (non-monetarily) by having a better search engine.

Wow by Moth7 · 2004-11-11 07:42 · Score: 1

They're taking on Square Enix too? o.0

Re:Wow by SoSueMe · 2004-11-11 07:58 · Score: 1

Who cares? As long as "index of /porn" search turns up good results!

MSN and Google by stratjakt · 2004-11-11 07:42 · Score: 1, Troll

Can both crawl up my ass.

And who cares what Jamie Crowell (or whoever), random blogger, thinks MSN might be doing, no doubt based purely on "ms sucks" rhetoric?

--
I don't need no instructions to know how to rock!!!!

Re:MSN and Google by rpdillon · 2004-11-11 07:57 · Score: 1

Hmm, did you actually *read* the article?

If URLs on your site are old (i.e. 404s) and are only indexed in Google, and yet you find MSN crawling them, only to find that their index is updated with those results shortly thereafter, well, that qualifies as something more than "'ms sucks' rhetoric". "Who cares?" might be a more appropriate retort.

Bloggers are just people. So are reporters. Just because some dude said it in a blog doesn't make it unreliable, any more than a journalist saying it makes it reliable.
Re:MSN and Google by stratjakt · 2004-11-11 08:00 · Score: 1

Who cares? Google offers a free service, I don't see why MS can't use it if they want.

I do believe they plan to be more than a second-tier google provider. They may take Google ranks into account with their own system.

Or, they could be getting links from archive.org, or other dead links from other sites.

How do you know those URLs on your site are only indexed in Google, and not on someone elses page, a forum post somewhere, or in the old MSN?

As for the "anti-ms venom" comment, maybe that's not why it's in fred mcgees blog, but that's the only reason it's on slashdot.

Badeepdbeeedpdpdeep THIS IS SLASHDOT NEWS SERVICE!!! HOT OFF THE WIRE - Some guy thinks microsoft is doing something bad!

--
I don't need no instructions to know how to rock!!!!

Google is Catholic? by TheAmazingBob · 2004-11-11 07:43 · Score: 5, Funny

"Google happily changed its habbits..."

Google is Catholic?

--

The Geek Crew

Spork or foon? by 3770 · 2004-11-11 07:43 · Score: 1, Offtopic

So, what name do you favor for the combined fork and spoon utensil?

Spork or foon?

--
The Internet is full. Go Away!!!

Re:Spork or foon? by JPelorat · 2004-11-11 08:00 · Score: 0, Offtopic

Splade

--
Hokey statistics and ancient misconceptions are no match for a good thought in your head, kid!
Re:Spork or foon? by SoSueMe · 2004-11-11 08:02 · Score: 1

I think "Poon" is appropriate. Isn't that what most searches are targeted towards?

If this were true... by barcodez · 2004-11-11 07:44 · Score: 1

Look I dislike M$ as much as the next guy, but if this were true then it would become immediately obvious to Google as they would be receiving a huge number of page requests from Microsoft. It would become even more obvious because they would be of the form

site:example.com

Doing this for say 100,000 domains would be noticable but would not even scape the surface of what's on the web.

--

----

Meta-search? by grasshoppa · 2004-11-11 07:44 · Score: 3, Interesting

The question is why? If they are doing this, are they simply going to present the results as their own, or are they going to work some magic and find the most relevant search results from ALL the engines and use those.

In the first case, it's a slimy business practice. In the second, it's fairly cunning ( and has been tried before ).

In either case, I doubt google is in any real danger. They are to search engines what MS is to the desktop. And while MS has squandered that advantage in the desktop arena ( reader homework: 250 word essay as to why ), google is only improving on their work.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!

Re:Meta-search? by }InFuZeD{ · 2004-11-11 07:59 · Score: 1

Back in the day I always used Metacrawler. It seemed to have the best results from searching all the engines, whatever they did, it worked.

Re:More lies from cowardly trolls by Anonymous Coward · 2004-11-11 07:44 · Score: 0

Yes, they most certainly do profit from the data they have amassed. If they didn't spider sites they wouldn't be visited by the public who wouldn't see their targeted ads.

Thus, they profit from my data.

Block? by worm+eater · 2004-11-11 07:45 · Score: 1

Why can't Google just block MS from crawling their site? Wouldn't Google notice if other spiders were crawling them?

--
Maybe partying will help...

Nothing new... by Moth7 · 2004-11-11 07:45 · Score: 1

It was how Mr Gates learnt to code in the first place ;-)

Re:Nothing new... by Anonymous Coward · 2004-11-11 07:51 · Score: 0

And you must have LEARNED how to spell by stealing from mexicans, you fucking retard.
Re:Nothing new... by Anonymous Coward · 2004-11-11 08:19 · Score: 0

3 entries found for learnt. learn Audio pronunciation of "learnt" ( P ) Pronunciation Key (lûrn) v. learned, also learnt (lûrnt) learning, learns v. tr. 1. To gain knowledge, comprehension, or mastery of through experience or study. 2. To fix in the mind or memory; memorize: learned the speech in a few hours. 3. 1. To acquire experience of or an ability or a skill in: learn tolerance; learned how to whistle. 2. To become aware: learned that it was best not to argue. 4. To become informed of; find out. See Synonyms at discover. 5. Nonstandard. To cause to acquire knowledge; teach. 6. Obsolete. To give information to. v. intr. To gain knowledge, information, comprehension, or skill: learns quickly; learned about computers; learned of the job through friends. [Middle English lernen, from Old English leornian. See leis-1 in Indo-European Roots.]
Re:Nothing new... by Anonymous Coward · 2004-11-11 08:36 · Score: 0

Ummm, okay Mr. Wannabe-Pedant, please point out the misspelt word in the GP... Judging by your use of capitals you may want to look up "learnt" in an adequate English dictionary first (hint, English != American).
Re:Nothing new... by mabinogi · 2004-11-11 10:20 · Score: 1

No, I think you'll find he can spell perfectly well.
Maybe it's time you learnt how things can be spelt in the rest of the world before you get burnt again.

Maybe you could start by asking some mexicans? apparently they know more than you.

--
Advanced users are users too!

LOL by Anonymous Coward · 2004-11-11 07:45 · Score: 0

i dont get it....

Re:LOL by Anonymous Coward · 2004-11-11 07:49 · Score: 1, Informative

Habbit = What a priest wears
Habit = A regular behavior for a person/thing
Re:LOL by Z4rd0Z · 2004-11-11 08:28 · Score: 1

And not only that, but priests don't wear habits. Nuns do.

--
You had me at "dicks fuck assholes".
Re:LOL by trewornan · 2004-11-11 19:44 · Score: 1

No, a "Habbit" is an Australian Hobbit.

Probably by dtfinch · 2004-11-11 07:45 · Score: 1

But doesn't Google index other search engines as well?

Firefox rendering by xPhoenix · 2004-11-11 07:46 · Score: 0

Maybe it's just me, but this beta search engine page renders better in Firefox than in IE. What browser are MS's devs using for their testing?

Re:Firefox rendering by JustNiz · 2004-11-11 07:50 · Score: 1

I find nearly all pages render better in Firefox than IE. Especially with adblock installed :-)
Re:Firefox rendering by geoffspear · 2004-11-11 08:32 · Score: 1

You must be new here.
Unless by "nearly all" you mean "all pages except slashdot, which inexplicably renders horribly in Firefox every 10th page or so."

--
Don't blame me; I'm never given mod points.
Re:Firefox rendering by SillyNickName4me · 2004-11-11 08:41 · Score: 1

Well, slashdot produces extremely crappy html (tho it is a lot better in 'light mode')
Re:Firefox rendering by westlake · 2004-11-11 09:17 · Score: 1

Especially with adblock installed :-)
SP2 was posted in August.
For others, the Google toolbar is the most successful of the many free plug-ins available for Internet Explorer.
Re:Firefox rendering by JustNiz · 2004-11-11 09:30 · Score: 1

ewwww..... you're using windows?
and spyware?...

Msn Crawling by clinko · 2004-11-11 07:46 · Score: 3, Informative

If you've been watching the logs to your site lately Microsoft has been RAPING most servers. Most crawlers will pick through pages with large lists 1 at a time, then come back every hour or so.

MSN starting last week has been pulling EVERY LINK in sequence from my site. Even the larger Artist Index pages of my site.

Seriously, I've had this same spider on my site for about 36 hours now.

Re:Msn Crawling by resprung · 2004-11-11 08:23 · Score: 1

I second that...

Here's from my daily log

Googlebot | 1673 hits | 16.69 MB
MSNBot | 517 hits | 13.40 MB

Those bots have no manners, they're really devouring bandwidth.

Well, I suppose they're making my backups for me.

--
Now is the winter of our disco tent
Re:Msn Crawling by Anonymous Coward · 2004-11-11 21:28 · Score: 0

> MSN starting last week has been pulling EVERY LINK in sequence from my site.
> Even the larger Artist Index pages of my site.

Yeah, and posting the URL on slashdot will certainly help reduce traffic...

Violates Google's TOS by Anonymous Coward · 2004-11-11 07:46 · Score: 5, Informative

From Google's Terms of Service

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

Re:Violates Google's TOS by shepd · 2004-11-11 08:19 · Score: 1

>The Google Services are made available for your personal, non-commercial use only.

Ahhh. So, let's see. If you use google at work, you should be going to jail. Sounds fair.

This is why most countries don't respect moronic crap TOS like this. Thank God.

If Google doesn't like it, they can start firewalling. That's their only legitimate answer to someone violating their TOS without doing something blatantly illegal (like DoS attacks).

Hey, Google, I'm holding up a finger. Guess which one?

Comment TOS: You may not guess which finger publically.

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Re:Violates Google's TOS by Thingummywut · 2004-11-11 08:40 · Score: 0

If you hate google, then why the gmail email address?
Re:Violates Google's TOS by Romancer · 2004-11-11 08:42 · Score: 1

Doesn't this make all those google bomb sites a violation?

Like when I search for "gardening tools" and come up with 400+ pages, each with a million variations of the phrase "gardening tools" that all link to a single viagra website?

"...or to increase traffic to your Web site for commercial reasons, such as advertising sales."

What does this line mean exactly?
If I'm mis-interpreting it can you give an example of what it's not allowing?

--

) Human Kind Vs Human Creation
) It'd be interesting to see how many humans would survive to serve us.
Re:Violates Google's TOS by Dhalka226 · 2004-11-11 08:47 · Score: 2, Interesting

Ahhh. So, let's see. If you use google at work, you should be going to jail. Sounds fair.
Can anybody take your comments seriously after you say something like "you should be going to jail?" I don't know when Google became a government agency that could send officers to your door for violating a TOS. No, at best it would be a civil issue. More likely, as you say, they have that clause as a justification if they choose to block usage.
However, of all the companies out there, Google would be the one of the least anal ones I could think of. Almost certainly that clause exists for only the purpose of blocking people doing what MS is (rightly or wrongly) accused of: Crawling them to offer a competing service. And THAT is taking money directly out of their pockets--you can bet if it were true and could be proven, they would do more than start firewalling. They'd be sueing somebody's ass off.
Frankly, I think that is a perfectly legitimate attempt to protect one's business. But hey, if you think it's moronic and crappy, that's your call.
Re:Violates Google's TOS by Anonymous Coward · 2004-11-11 09:06 · Score: 0

According to that TOS one would think MS would be sure to abide by it assuming all this is true.
Re:Violates Google's TOS by Jugalator · 2004-11-11 09:08 · Score: 1

Ahhh. So, let's see. If you use google at work, you should be going to jail. Sounds fair.

Nah, it rather says you can't use them to make a profit from them.

Hey, Google, I'm holding up a finger. Guess which one?

Are you saying you don't prefer Google above all other web engines?

If not you're just another hypocrite.

--
Beware: In C++, your friends can see your privates!
Re:Violates Google's TOS by shepd · 2004-11-11 10:59 · Score: 1

>If you hate google, then why the gmail email address?

Google, the service, is good.

Google, the lawyers, are bad (IMHO).

So, I don't hate the google service. I don't particularly hate their lawyers, either, but I do consider any lawyer requesting a website TOS to be put up to be overzealous.

Then again, I'm assuming it was a lawyer that put that up there. Perhaps it wasn't... although that would be odd, in my expereience.

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Re:Violates Google's TOS by shepd · 2004-11-11 11:06 · Score: 1

Can anybody take your comments seriously after you say something like "you should be going to jail?"

If you can't take a little literary license into consideration when reading comments, then everyone will take you dead seriously. That is not a good thing...

Almost certainly that clause exists for only the purpose of blocking people doing what MS is (rightly or wrongly) accused of:

Of course, that's not the part that I quoted.

Let me quote what I quoted again:

>The Google Services are made available for your personal, non-commercial use only.

I then suggested (with literary license) that Google considers it illegal to use their service at work.

Quite honestly, I would be very surprised if at least half their hits did *not* come from a work computer. To tell a large chunk of your customers to go away or risk being banned (at best) is a very bad idea, IMHO. I run a business, and I certainly would not tell customers they are not allowed to use our website at their workplace. In fact, I think I would encourage it.

Frankly, I think that is a perfectly legitimate attempt to protect one's business

Frankly, I think that the law already says that if you want to deny someone the use of your website for any reason other than a certain few (racism, for example) you are already allowed to do that, and that wasting a webpage to put customers off using your website *is* a moronic and crappy business decision.

But hey, if you think it's a good idea to put customers off from using a service, that's your call.

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Re:Violates Google's TOS by shepd · 2004-11-11 11:09 · Score: 1

Nah, it rather says you can't use them to make a profit from them.

So, can you explain this?

Are you saying you don't prefer Google above all other web engines?

Uhhh, no. I'm saying that their TOS is silly, and moreover making a point of how silly a TOS is (read the next line).

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Re:Violates Google's TOS by mpcooke3 · 2004-11-11 12:39 · Score: 1

I wonder if I put this kind of personal use disclaimer on my website would that make it illegal for google to scrape it?

Seems to me like google are in a weak position given that no one granted them permission to scrape the stuff in the first place.

IP by OxygenPenguin · 2004-11-11 07:47 · Score: 1

Next you thing you know, MS will be suing Google for IP rights to their cache. That, or buying the cache from them for $50k and then sucking to IBM.

--
Read the only personal Runyon page out there.

customer-provided IP addresses. by morcheeba · 2004-11-11 07:49 · Score: 1

Microsoft could always have the google queries come from the user's computer, and integrate the results on the user's computer before displaying it. This would be impossible to block with IP address, but may be blockable with some sort of query heuristic. I'd think this could be done with Java or ActiveX pretty easily (I'm more of an embedded programmer...)

--
HIV Crosses Species Barrier... into Muppets

Re:customer-provided IP addresses. by Anonymous Coward · 2004-11-11 07:51 · Score: 0

But they're not using Java (at least in Firefox) and I don't imagine they're using ActiveX in IE.

Just speculation right now... by HermesHuang · 2004-11-11 07:49 · Score: 1

After reading the article all of this is based on one result and a bit of speculation. However, if true, I would hope Google quickly finds a way to block this.

What would be funny is if Google could detect when it is Microsoft sending a query through their system and return random results. Or return 5000 results all of which are redirects back to the MSN search page. And of course, Microsoft can't complain about such a thing because in doing so they'd admit they're trying to use Google's results.

I wonder how long some of the less intellegent MSN users would spend at the search page clicking on links that redirected back to the MSN search page?

Absurd by targo · 2004-11-11 07:50 · Score: 4, Insightful

The claims are so absurd I don't even know where to start.
1) His whole theory is based on the "fact" that the only way in the world to find his pages is to use site:www.sitename.com in Google, implying that Google has cached the results from an earlier crawl. Of course, there is no way that the Microsoft search couldn't have also cached it.
2) Then, he claims that Microsoft is probably screen-scraping Google's results (for all the millions of sites out there), and using these results to recrawl those sites? This doesn't even make any sense.
3) And last but not least, Microsoft is certainly basing its whole search architecture on the assumption that Google wouldn't ever notice MSN mirroring its whole index. Yeah right.

--
When men used to be men

Re:Absurd by dfj225 · 2004-11-11 08:11 · Score: 1

I think the large amount of traffic coming from MS alone would be enough to clue Google in on something fishy.

--
SIGFAULT
Re:Absurd by cpeterso · 2004-11-11 09:44 · Score: 1

If the ONLY link to his web site is from Google and MSNbot is screen-scraping Google's search results, HOW would MSNbot KNOW to search Google for his web site's name??

--
cpeterso
Re:Absurd by Hobophile · 2004-11-11 09:51 · Score: 1

Sorry, I don't buy it. If the only link to his site was from Google, how did Google find it in the first place?
I don't even think that Google lists a page that's not linked to elsewhere. The entire concept of PageRank is determining what people think about a site by whether or not they link to it, what they say about it when linking to it, and how important the person doing the linking is.
So basically, Google might crawl the page, but unless another site is pointing there then it will not show up in any search listings (including the site:x.com type).
And if another site is linking to it, obviously that's where MSN Search found out about it.
Re:Absurd by jamesl · 2004-11-11 10:15 · Score: 1

Stop being rational, it confuses the readers. And if it catches on, there won't be more than three or four comments per story.
Re:Absurd by theancient2 · 2004-11-11 12:17 · Score: 1

Google found my site before there was a link to it anywhere. I registered a domain, and before the site was even complete, Google showed up. Google itself didn't reveal any reverse links, and I couldn't come up with any other explanation. (Unless someone is monitoring new domain registrations.) Another time, Google found a link to a file that had only been posted on a subscription-only site. The point is, it's not always obvious where search engines get their links from. That makes the conspiracy theory even less likely.
Re:Absurd by Rangataua · 2004-11-11 12:31 · Score: 1

Worse than that, he only checked the MS search after the spider had scanned the site, so of cause any old invalid URL's will have been purged (assuming MS have been running their spider for a little while now). I entered name of a domain that I have recently registered (currently serving a single static page and not linked from anywhere on the Internet) in the MS search and got no results back, while the same search in Google (where I have submitted the URL) returned my domain at an unimpressive number 5
Re:Absurd by marauder404 · 2004-11-12 00:11 · Score: 1

Also, the Google Toolbar clues Google into new websites. That little icon that shows PageRank asks the mothership if it knows the PR for a particular URL. If they've never seen it before, it's pretty reasonable to assume that they'll come by eventually and take a look at what you've got ... This is another possible explanation for the ridiculous conspiracy theory.
Re:Absurd by theancient2 · 2004-11-12 17:07 · Score: 1

I do use the Google toolbar, so that could very well be the reason.

Different Corporate Philosophies? by Gothmolly · 2004-11-11 07:50 · Score: 1

Google: Do No Evil
Microsoft: We'll Decide What's Evil, Thank You Very Much

--
I want to delete my account but Slashdot doesn't allow it.

Re:Different Corporate Philosophies? by archivis · 2004-11-11 08:08 · Score: 1

You are my foe!

--
In July O7, I got a mac pro. There's no punchline. Just endless joy and wonder.

Don't go through garbage in Portland by willjohnson · 2004-11-11 07:51 · Score: 1

You're likely to piss someone off. Like the mayor.

uhh, law DOES matter by tacokill · 2004-11-11 07:51 · Score: 1

If you copy his work without permission, you've already committed copyright infringement -- so yes, you violate the TOS by default.

Comparing this to the MS/Google situation is not the same so the grandparent post still stands.

Re:uhh, law DOES matter by RmanB17499 · 2004-11-11 11:23 · Score: 1

Copyright law protects ideas not the actual piece of paper or the publication. Google's idea can't be duplicated without permission. What exactly is Google's idea? Plus, then one could argue that Google is making money off of everyone else's copyrights without any royalty. Nobody goes to a search engine to just look at results. They want to see the information behind those results. The idea of a web directory or search engine isn't protected. Google's search engine software is copyright protected. But the results of a copyright protected tool don't neccesarily lead to another copyright. Example: I use Microsoft Word to publish a document. Microsoft's copyright does not extend to my document's ideas, but only to the underlying file format. I tend to agree with the phonebook analogy. However,

Could this be the end? by beaststwo · 2004-11-11 07:51 · Score: 1

Could this be the end of only unique content on each Web page on the Internet? We've had to suffer all these years with no duplication of content and not a single case of recursive linking between web pages.

It's almost refreshing to see that the Internet may well be catching up to television...Media maturity at last!

Another thing to consider is ... by Anonymous Coward · 2004-11-11 07:52 · Score: 0

... the MSN search beta that I saw stole everything from Google anyways.

The user interface originally looked like Google. They clustered commodity PCs in the same 'shard' configuration as Google. Their ranking algorithms considered links like Google.

They have done nothing innovative, and they are continuing to chase taillights. Let's hope they don't catch up.

Probably Not.. by DelawareBoy · 2004-11-11 07:52 · Score: 2, Interesting

My website is the #1 site listed with specific Criteria on Google. Consistently for the last 2 months. I try the same thing with MSN search and My site does not even show up at all.

If they are searching Google, they haven't done it recently, or else they haven't gotten to my site yet.

Spike the results, then sue by G4from128k · 2004-11-11 07:52 · Score: 4, Informative

It would be easy for Google to insert a small fraction of non-sequiturs in the results, look at Microsoft's search results, and then sue for misuse. Even if MSFT uses random proxies to avoid detection, it cannot manually recheck all the hits to make sure they are correct (if they could, they had the resources to check all the sites, then they not need to crawl Google. A few made-up sites or inappropriate search hits would be enough to establish a pattern of abuse.

--
Two wrongs don't make a right, but three lefts do.

Re:Spike the results, then sue by alphapartic1e · 2004-11-11 08:16 · Score: 1

... and then sue for misuse.

Wow, another problem solved The American Way!
Re:Spike the results, then sue by WhiplashII · 2004-11-11 09:08 · Score: 1

Even easier than that - type a random string (like SAGASDGXBJDFHZH) into msn, check the google search logs for that random string.

--
while (sig==sig) sig=!sig;
Re:Spike the results, then sue by Dogun · 2004-11-11 09:54 · Score: 2, Informative

Seems you don't understand how search engines work^^

What a normal spider does is generally try different IP's, see if they're running a webserver. Then they do a DNS lookup, fetch http:///robots.txt and read that to decide if indexing is allowed, and where. Then it just walks through the website. A number of places on the website might not be directly accessible, but also not disallowed for indexing by robots.txt.

If some other site has a link to that webserver in some disconnected region of the website, then the crawler generally makes sure it's okay to index that against the robots.txt, and if so, indexes.

The accusation here is that Microsoft isn't finding these adresses on their own, but instead using google's 'site:host.domain' results as a shortcut, which would constitute a violation of google's terms of service.
Re:Spike the results, then sue by WhiplashII · 2004-11-11 09:59 · Score: 1

Technically, I knew that and just didn't think things though... I was thinking for some reason that Microsoft was reporting google searches as there own.

--
while (sig==sig) sig=!sig;
Re:Spike the results, then sue by hchaos · 2004-11-11 10:03 · Score: 1
Wow, another problem solved The American Way!
I'm sure that they could ask Microsoft to stop doing this (assuming that there's any basis to this story in the first place) and reimburse Google for all appropriate damages, but I also suppose that it's completely naive to think Microsoft will comply with anything less than a lawsuit. Which means that the only logical interpretations of your apparently sarcastic comment are:
1. You aren't being sarcastic, and you mean that Google would be protecting their rights without resorting to violence, and that's the American Way, or
2. you don't understand the importance of the courts and lawsuits in allowing private citizens and corporations to protect their rights in a non-violent manner.
Re:Spike the results, then sue by lixlpixel · 2004-11-11 11:48 · Score: 1

especially since the google robots file does not allow bots.
see : http://www.google.com/robots.txt
Re:Spike the results, then sue by NicksMyName · 2004-11-11 12:39 · Score: 1

This is exactly what the mapping companies do to prevent people ripping off their maps. They put fictitious streets into their maps so anyone copying the map can be caught red handed. There's an archive of a Sydney Morning Herald article about a Sydney couple who went looking for two non-existant streets around the corner from them and discovered they never existed.
A few years ago a friend of mine who works at a European government's mapping division was having a very close look at some of Microsoft's maps for just this reason. I notice there are no maps of this country on the MS map website.
Microsoft who have also been borrowing radio station's playlists so it would be no surprise if Microsoft were borrowing supposed "public" information without the owners permission.

Limit by rattler14 · 2004-11-11 07:53 · Score: 1

I might be mistaken, but I thought google has a 10,000 query limit per IP address per day. So it might be conceivable that enough computers over several days could get it, though I imagine it wouldn't be trivial

I think this is mentioned in Google Hacks by O'Reilly. Those with an online account there can check it out and mock me if I'm wrong :)

--
my last sig was too controversial... now, a new and improved useless sig!

try this by Anonymous Coward · 2004-11-11 07:54 · Score: 0

search google for site:google.com

MOD PARENT DOWN by Anonymous Coward · 2004-11-11 07:54 · Score: 0

The fool is trying to con you.

Yep. by Skiron · 2004-11-11 07:55 · Score: 1

I see bots hitting a cgi test set-up forum I ran 2 years ago (before uploading to remote ISP) STILL try to index pages. I think the bloke is spot on with his analysis.

They really only need to seed their crawler... by JustNiz · 2004-11-11 07:55 · Score: 5, Interesting

You can't get to every page on the internet just by starting at one page and recursively following links, therefore the more places you from, the more likely you are to have 100% coverage.

I could imagine that Microsoft just needs a few thousand URL's evenly-spread across the internet just to seed their crawler, which they can get from Google by using a list of most popular queries.

Once their crawler has so many starting points it can do the rest itself.

Re:They really only need to seed their crawler... by Anonymous Coward · 2004-11-11 08:56 · Score: 0

Do you know that for sure?

Anyways, the way to start is to get your seed data from a registrar.
Re:They really only need to seed their crawler... by jkauzlar · 2004-11-11 09:15 · Score: 1

I should think it would be easy to get a few thousand links without going to Google. But I doubt this is what they're doing; the guy who wrote the essay said unreferenced pages from his relatively-unpopular site were getting hit and that they were most probably only referenced from Google. If it were a popular site then your theory might work, but it appears they're doing it from unpopular search hits as well, which means they'd be seeding their engine with WAY more than mere thousands of pages from Google. It'd have to be on the magnitude of at least millions.
Or maybe the execs at MS asked some innocent interns to "seed" from Google with a wink ;)
Re:They really only need to seed their crawler... by ad0gg · 2004-11-11 09:43 · Score: 1

To seed your Search engine just go download the DMOZ directory, its free to use. There's no reason why any company would need to use another search engine for seeding. Yahoo has been seeding their index by grabbing new domain registrations, was quite a suprized when the domain I bought was spidered within a week after buy it and setting up a webserver on it.

--
Have you ever been to a turkish prison?

Mod parent up! by Anonymous Coward · 2004-11-11 07:56 · Score: 0

Someone else can see thru garcia's whining! Hey garcia: It's the internet. Either manually block the bots or STFU.

Re:Mod parent up! by Anonymous Coward · 2004-11-11 15:32 · Score: 0

Either manually block the bots or STFU
How do you do that? By wrestling with the bots Captain Kirk style?

tesiting results by KillerCow · 2004-11-11 07:57 · Score: 1

They could just be comparing results between the two engines... for testing purposes.

Hello!!... by Eggplant62 · 2004-11-11 07:57 · Score: 1

It's called a router. It can be set to null route whole chunks of IP address space. Set it to forget where Microsoft is and forget it.

msnbot IP address ranges? by Dr.Dubious+DDQ · 2004-11-11 07:59 · Score: 1

Anybody know what IP address ranges msnbot is using? Might be possible to limit the rate of connection from those addresses using firewall rules (or, for that matter, forbid connection entirely if that's your preference) to avoid the "hammering" that msnbot is said to be doing...

--
Hacker Public Radio is our Friend

wow what a load of bullshit... by greymond · 2004-11-11 07:59 · Score: 1

that article was so ambiguous..."some person was searching some site and it was being spidered by some MSN bot and the links were added sometime after"

Yay way to go slashdot thanks for posting the most blatant flamebait article ever - how about for your next post, you repost that routers article about a machine that makes more energy than it uses....

--
Ave Molech Setting

just a comparison ... by geraint-nz · 2004-11-11 07:59 · Score: 1

for the search argument - linux
google - Results 1 - 10 of about 203,000,000 for linux [definition]. (0.22 seconds)
msn search - Web Results 1-9 of 28,254,249 containing linux (0.19 seconds)
it looks like the m$ search is just a toy

Re:just a comparison ... by Chess_the_cat · 2004-11-11 08:15 · Score: 1

Quality vs. quantity is an important concept when dealing with search engines unless you have the time to view 200 million page results yourself. My ideal search engine would return 100 results max per query.

--
Support the First Amendment. Read at -1
Re:just a comparison ... by Anonymous Coward · 2004-11-11 08:19 · Score: 0

If you keep clicking the next batch of results, I think you will find that you cannot actually retrieve anywhere near the total number of results. So that number is kind of bogus.

They may well have scanned that many pages with "linux" in them, and only kept the association with the keyword on an algorithmically determined portion; but if we can't get to the whole set, then we just don't know.

I see this too by digitalgimpus · 2004-11-11 07:59 · Score: 1

I just grep'd my access log file to see what I get...

I get very similar results as the article discusses, except with the IP 65.54.188.149.

grep for 65.54.188... lets see what others get.

a company I worked for did this once... by Skuld-Chan · 2004-11-11 08:00 · Score: 2, Insightful

And got banned from using google. Seriously.

Re:a company I worked for did this once... by bill_mcgonigle · 2004-11-11 09:29 · Score: 1

Were the passing off the results as their own or offering a meta-search engine?

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:a company I worked for did this once... by KD5YPT · 2004-11-11 11:59 · Score: 1

Of course. No search engine look kindly upon automated bombing of their engine. If an IP behaves like it's bombing their site maliciously (a few hundred searches per minute maybe), then its fully their right to ban that IP.

--
In US, you can easily buy enough major firearms to wipe out your neighbourhood but a few little fireworks are banned.
Re:a company I worked for did this once... by Skuld-Chan · 2004-11-11 13:20 · Score: 1

No actually they were working on a content rating system (for filtering web pages) and they wanted googles ratings on certian keywords.

Please... by mstefanus · 2004-11-11 08:00 · Score: 1

Please Slashdot this link.

But...? by Malicious · 2004-11-11 08:00 · Score: 1

Isn't Google a webpage? Is MSN doing anything wrong by indexing a webpage and it's subpages?
Look at it this way. If Google were to complain about someone searching their page/databases, they would be the largest hypocrites in the history of history.

--
01101001001000000110000101101101001000000110001001 10000101110100011011010110000101101110

Re:But...? by Coleco · 2004-11-11 08:11 · Score: 1

True but their servers are their property and they're running a business. If they don't want everyone mirroring their results that's their right. Most people *want to* be indexed by google, and if they don't want to, they don't have to be.
Re:But...? by KD5YPT · 2004-11-11 11:51 · Score: 1

Yes, you're correct. However, all crawlers are obligated (not legally, but as a courtesy) to obey the robots.txt which indicate which robots you're willing to allow on your site.

As a link pointed out by another poster...
http://www.google.com/robots.txt

Google specifically stated that robots cannot crawl its site.

Before you start slamming google. Almost all search engine contain said robots.txt to prevent other crawlers from overloading their server.

--
In US, you can easily buy enough major firearms to wipe out your neighbourhood but a few little fireworks are banned.

Terrible article by angio · 2004-11-11 08:02 · Score: 4, Insightful

The author suggests that microsoft must be scraping google b/c the only place _he_ could find the URLs they're requesting was google's cache.

Uh.

Microsoft has been developing their internal search engine for quite a while now. Part of developing a search engine is using it to crawl and creating a large corpus of test data. It's hugely likely that M$ has had a working crawler system for much, much longer than would be indicated by their public announcement. Quite a few people who helped develop Altavista at HP/Compaq/DEC research joined Microsoft Research about two years ago - the kind of people who could write a high-performance crawler in their sleep and wake up feeling refreshed.

That article seems like baseless, uninformed speculation, to put it not-so-politely.

Re:Terrible article by bad-badtz-maru · 2004-11-11 09:48 · Score: 1

You are exactly right. Although the moron cited in the article apparently just noticed the crawler a few hours ago, it's been aggressively crawling the web for over a year.
Re:Terrible article by MushMouth · 2004-11-11 10:36 · Score: 1

Also all IE browsers from 1999 to XP sp2 had a "show related links" menu item which gave alexa data for a complete URL. This request was proxied through microsoft's servers. Don't you think they logged the requests?
Re:Terrible article by JW+Troll · 2004-11-11 13:05 · Score: 1

After using MSN Search to find (or, rather, attempt to find) MS security bulletins, I came to the conclusion that a) MSN Search is not really a search engine, nor does it index any meaningful content; and b) Google is the only search engine that returns results about MS bulletins. Try it.
What I'm saying here: Microsoft couldn't write a high performance ANYthing, much less something as complex as a search engine that could in any way rival Google. NO BLOODY WAY.

--
just like the humble blood clot... turboporsche@telus.net

This could be entirely natural... by theluckyleper · 2004-11-11 08:02 · Score: 4, Insightful

I'm certainly no Microsoft groupie, but this behavior may not be as sinister as it seems. Afterall, Google is on the internet, too. There are links found all over the internet to Google, with some specific search term embedded in the URL. If MSN's bot happened upon a link to a Google search page, is it somehow wrong for the MSN bot to follow that link, and spider as normal?

--
Visit the Game Programming Wiki!

Re:This could be entirely natural... by Boarder2 · 2004-11-11 08:20 · Score: 1

http://www.google.com/robots.txt

Not illegal, I don't think, but generally frowned upon.
Re:This could be entirely natural... by Anonymous Coward · 2004-11-11 08:39 · Score: 0

I don't think it's doing that.

Try typing in some well known google bombs. The results will be almost identical to Google's.

It looks very much like Microsoft is taking Google's results and passing it off as their own.
Re:This could be entirely natural... by IIH · 2004-11-11 08:47 · Score: 2, Informative

If MSN's bot happened upon a link to a Google search page, is it somehow wrong for the MSN bot to follow that link, and spider as normal?
Find a link, fine
Follow the link, fine
Spider the link, not fine - google's Robots.txt does not give them permission to.

--
Exigo spamos et dona ferentes
Re:This could be entirely natural... by julesh · 2004-11-11 09:13 · Score: 1

It looks very much like Microsoft is taking Google's results and passing it off as their own.

Or MS has spent a subtantial amount of time and effort reverse engineering Google's PageRank algorithm, and now they have it so accurately sussed that based on a similar selection of indexed sites you get similar results for the same query.

Not hard to believe. MS has many, many talented programmers. More than google, in fact.
Re:This could be entirely natural... by Anonymous Coward · 2004-11-11 09:34 · Score: 0

First it's wrong because of google's license agreement and then see this: http://www.google.com/robots.txt

Disallow: /answers/search?q=
Re:This could be entirely natural... by Negativeions101 · 2004-11-11 11:20 · Score: 0

PFF! *spill beer all over the place*
*cough* ...so why is Windows SHIT!?

--

I'm not anti-microsoft. I'm anti-bullshit. Which means I'm anti-microsoft.
Re:This could be entirely natural... by KD5YPT · 2004-11-11 11:49 · Score: 1

Disobeying the robots.txt is not illegal, you're correct.

However, among tech communities... it is an invitation to get SERIOUSLY hacked.

--
In US, you can easily buy enough major firearms to wipe out your neighbourhood but a few little fireworks are banned.
Re:This could be entirely natural... by multimed · 2004-11-11 16:33 · Score: 1

This is Microsoft we're talking about. That train left the station years ago.

--
Vote Quimby.
Re:This could be entirely natural... by julesh · 2004-11-11 18:45 · Score: 1

Bad management.
Re:This could be entirely natural... by Negativeions101 · 2004-11-11 20:13 · Score: 0

I recall something along the lines of "we are morons"

--

I'm not anti-microsoft. I'm anti-bullshit. Which means I'm anti-microsoft.

Interesting by Eric119 · 2004-11-11 08:02 · Score: 2, Insightful

Try entering a known Googlebomb into the MS search engine. "litigious bastards" shows up www.sco.com as the number one hit.

Re:Interesting by GQuon · 2004-11-11 09:15 · Score: 1

Yes. But it's not just a Googlebomb. It's also a [Insert any page ranker that takes words in the link into consideration]bomb. I don't think Google holds a patent on that.

--
Irene KHAAAAAAN!

Microsoft is by leav · 2004-11-11 08:02 · Score: 1

Microsoft is assimilating Google!

modulate you phasers!
go into the holo-deck and get a tommy gun!

run!!!!!!!!

-LeaV

--
I own a pump action golf ball cannon. I made it myself.

Re:Microsoft is by chicagojosh · 2004-11-11 08:07 · Score: 1

Microsoft tried to assimilate Google not too long ago, but the guys at Google said "thanks, but no thanks," so Microsoft is doing what they do best- stealing what other people have done. They'll rape and pillage Google as much as they can, put MS Search in Longhorn, then wait it out to see if Google collapses. Then comes another age of darkness where hackers rip through the security gaps until someone has to create Firemonkey, the Mozilla search engine.

Garbage is garbage by moorcito · 2004-11-11 08:02 · Score: 1

Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

The analogy seems a little wrong in this case. You throw stuff out because it's trash that you don't want anymore, thus you relinquish all rights to it being yours in the first place.

What Microsoft is doing could be likened more to plagerism, since google is doing the work and Microsoft is passing the results off as their own.

In other news... by dfj225 · 2004-11-11 08:03 · Score: 2, Funny

Microsoft's beta search engine's index doubled in size to over 8 billion pages.

--
SIGFAULT

How can Googel and MS(N) be wrong by anandpur · 2004-11-11 08:03 · Score: 1

See the results

http://www.google.com/search?hl=en&q=miserable+fai lure&spell=1
http://beta.search.msn.com/results.aspx?q=miserabl e+failure&FORM=QBHP

Sure, dipping...not swallowing whole. by BlowChunx · 2004-11-11 08:04 · Score: 1

So I can see how you can distill the entire content of the web that your bot has crawled into a database, but is it possible to pump enough queries into Google to get the entire database? (Or in more mathematical speak: Is this a well posed inverse problem?)

I don't think so. You still have to have your own crawler (to use on the top ranked results of any query). And a good set of queries to hit google with (so you have an idea of what to index)...which changes constantly. Look at Google's zeitgeist some time (link left to some karma whore...)...

Crawling Google? by Anonymous Coward · 2004-11-11 08:05 · Score: 0

Not sure if it's crawling google, but it crawls. Can you say, "sloooowwwww"? As with pretty much everything MS offers, it's a POS.

redirect msnbot requests to the obligatory URL? by spamfo · 2004-11-11 08:05 · Score: 1

Well several people have mentioned that google simply need to block the IP ranges of the MSNbot.

Why not have some more fun like what people have previously used to stop evil hotlinkers.

Some lovely mod_rewrites to goatse would go down an absolute treat, imagine the faces on the MSN staff after coming into work expecting to have leeched a stack of google results to find their cache filled with the world famous goatse images :)

Referer logs, folks. by Anonymous Coward · 2004-11-11 08:06 · Score: 0

So I do a Google search. The Search has www.mysite.com/someoldurl.html still listed, but that URL is gone. I click on www.mysite.com/someoldurl.html. I get my custome 404 page, possible - or a redirect.

This information is then stored in refer logs. Also, if I am using IE search, when I plop from one result to the next, the second one will see the first plop in the referer info. Not the IE search.

Some folks make their logs public, which Microsoft could crawl to find these links.

Just one possibility. I highly doubt they are screen scraping Google results.

they are crawling pgp keyservers by SilveRo_kun · 2004-11-11 08:06 · Score: 1

Yup, I searched for my name, and I found it in lots of keyservers, that show all the e-mail addresses of the people in my keychain.... say hello to SPAM =)

how dare they make use ofgoogles pigeon technology by Anonymous Coward · 2004-11-11 08:06 · Score: 0

http://www.google.com/technology/pigeonrank.html

msnbot started crawling in 2003. by sednet · 2004-11-11 08:07 · Score: 1

unless the author of this sensational article reviewed their httpd logs for the user agent 'msnbot' clear back to 2003, they have not ruled out the possibility that microsoft's spider simply crawled the site in question, before msn search was a tech news feature. brett tabke's webmasterworld forums mention sitings of msrbot from microsoft in april 2003, and widespread msnbot activity starting december 2003. its also possible that microsoft seeded their search index by licensing it from a comparable index source, e.g. the alexa crawl.

--
about sean dreilinger

Hey Google, please don't make us... by potus98 · 2004-11-11 08:07 · Score: 4, Funny

Hey Google, please don't make us read those wacky JPG/GIF letter scrambles with criss-cross lines and input the random characters into a field before submitting a search.

"Hold on a sec while I Goog- Huh? Grrrr.... H... P... 7... O... wait no, 7... zero... ummm...

--
This one gang kept wanting me to join cause I'm pretty good with a bo staff.

Re:Hey Google, please don't make us... by knodi · 2004-11-11 08:20 · Score: 1

They're called "captchas", a product of Carnegie Mellon

--
Austin is more fun than Dallas.
Re:Hey Google, please don't make us... by SB5 · 2004-11-11 08:55 · Score: 1

They're called "captchas", a product of Carnegie Mellon, which is a product of Microsoft.

--
If what you are reading sounds funny, or sarcastic, lame, or stupid
it is because it is supposed to be. just laugh

Looking foolish by Anonymous Coward · 2004-11-11 08:08 · Score: 1

"It's" is short for "it is". Don't use it by mistake when you mean the possessive sense: "its".
It's true that Slashdot and its users are prone to spelling error.

worthless by Anonymous Coward · 2004-11-11 08:09 · Score: 1, Insightful

This article is an example of why blogs are worthless ... He never thought of *asking* Microsoft, did he?

Block MSN from crawling your sites! by Alascom · 2004-11-11 08:11 · Score: 0, Redundant

Add this to robots.txt

User-agent: msnbot
Disallow: /

I will fight Micro$oft efforts to monopolize another area of the tech industry (to its detriment)

Google: Don't be evil!
Microsoft: Greed is good, greed works!

Re:Block MSN from crawling your sites! by narcc · 2004-11-11 08:23 · Score: 1

I don't think I'd trust msnbot to act in an honerable fashion. Remember the days when IE would identify itself as netscape?

--
Required reading for internet skeptics
Re:Block MSN from crawling your sites! by Vile+Slime · 2004-11-11 09:06 · Score: 0

I,

Did that over a year ago yet they still cache practically my entire website.

Shows how much I change my site, but you would think they would eventually expire the results in some manner.

--
---- Go ahead, mod me down, I'll just post it again and you lose your mod points.
Re:Block MSN from crawling your sites! by TheInternet · 2004-11-11 09:23 · Score: 1

Add this to robots.txt

Not sure this really helps. All it means is people using MSN won't find your site. Unless it's done on a massive (and I mean massive) scale, it's pointless.

Imagine if Slashdot did this. It would be a minor irritation for MSN, but a lot of lost hits for Slashdot.

- Scott

--
Scott Stevenson
Tree House Ideas
Re:Block MSN from crawling your sites! by greymond · 2004-11-11 11:08 · Score: 1

Remember the days when IE would identify itself as netscape?

Remember how today Opera, Mozilla, Firefox, Netscape, Safari, and every other browser can identify itself as IE.....

--
Ave Molech Setting
Re:Block MSN from crawling your sites! by Anonymous Coward · 2004-11-11 21:40 · Score: 0

I just used some PHP code to block it from doing anything it should not do. here is to code, free to use at your own risk:
// banned by user agent if(preg_match("/msnbot/", $_SERVER['HTTP_USER_AGENT'])) { exit("Access denied (msnbot)."); }

Not Surprising by woddfellow2 · 2004-11-11 08:12 · Score: 1

This isn't surprising. They steal from competitors all the time.

--
1-Crawl 2-Cnfg 3-ATF 4-Exit ?

Bogus article by YU+Nicks+NE+Way · 2004-11-11 08:13 · Score: 2, Insightful

This whole article is based on the speculation of a web master who notices that a bot which allegedly isn't leaving behind a bot name is crawling his site. He then figures out that, oh look, there is a standard record in his server log.

And I'm supposed to take this clown's "friend" seriously? That's not a good start, anyway.

But then there's the real howler: the site can allegedly only be found through site: on Google. How does the friend know that? Has he done a complete crawl of the web to find all forward links to any image in his site -- even broken ones? MSNBot, like all bots, recognizes that many anchors are broken, and tries plausible corrections around the broken links. That's particularly useful with a deep link, where the deep link may have timed out but the shallow link still exists.

From TFA: by Nosf3ratu · 2004-11-11 08:14 · Score: 0, Redundant

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it.

It's official: 14 year old AOL'ers are now known as "IT journalists."

Call me a literary pedantic, but I don't trust much journalism that includes "LOL".

--
The old Lie: Dulce et decorum est Pro patria mori

RTFA by mcguyver · 2004-11-11 08:14 · Score: 1

The author admits his conclusions are based off a grain of salt. This article is more like a conspiracy theory than it is news.

Shatner cracking skulls by FunWithHeadlines · 2004-11-11 08:15 · Score: 1

No, it would be more like this:

William Shatner can then swoop...in, crack... some skulls, and save the... day.

(Followed by sleeping with the green-skinned alien slave girl)

Re:Shatner cracking skulls by kalidasa · 2004-11-11 10:19 · Score: 1

Note that Kirk and the Orion slave girl were never in the same shot at the same time; Kirk only "saw" the Orion slave girl (who wasn't an Orion slave girl, but an illusion-altered-40-something human survivor of a spaceship crash) putting the moves on Pike. It's slashdot, of course I have to be pedantic about Trek.
Re:Shatner cracking skulls by FunWithHeadlines · 2004-11-11 11:48 · Score: 1

Very well replied, this being slashdot and all. But to be honest, I wasn't thinking of your episode. I was thinking of Marta in Whom Gods Destroy, even though Kirk didn't actually sleep with her. But given half a chance...
Re:Shatner cracking skulls by kalidasa · 2004-11-11 13:30 · Score: 1

Damn, I forgot about Marta. You got me on that one.

Full Circle by Guppy06 · 2004-11-11 08:17 · Score: 5, Interesting

"Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

It's interesting to know that Bill Gates has been forced to go back to his roots...

The best way to prepare [to be a programmer] is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and fished out listings of their operating system.

So we get upset when Microsoft crawls by Anonymous Coward · 2004-11-11 08:18 · Score: 0

But it's ok for Google to crawl and link to news sites, which is already a legal grey area.

What's the difference?

Re:So we get upset when Microsoft crawls by Anonymous Coward · 2004-11-11 08:22 · Score: 0

Google isn't taking credit for the news, simply aggregating it. It's not like the news articles get posted on Google.com, only links and small excerpts.

It's worse!!! by lameja · 2004-11-11 08:18 · Score: 1

Microsoft is using all the windows boxes around the world and recovering all the results of the searches done in Google by the machine's owners !!!.

That's another reason to leave windows ;)

Never ending email addresses by Sai+Babu · 2004-11-11 08:18 · Score: 1

All theemail addresses a spammer cares to swallow.

--
Now I'm the grandest Tiger in the Jungle!

I can just see it... by 93+Escort+Wagon · 2004-11-11 08:22 · Score: 1

Coming soon, on the main Google search page:

Google Search - recommended by Microsoft!

--
#DeleteChrome

Gee, wonder why you got banned... by Anonymous Coward · 2004-11-11 08:25 · Score: 0

hahaha

Nonsense by Anonymous Coward · 2004-11-11 08:27 · Score: 0

What the fuck is this? Take a stab at Microsoft with no evidence what-so-ever? Isn't this more commonly known as slander? I can't believe even Slashdot published this nonsense.

Arg I hate M$ by OverlordQ · 2004-11-11 08:28 · Score: 3, Interesting

Yes this might sound like a rant, but somehow (partly my fault), the MSN Spider bot found one of my joke cgi scripts that translate pages to my own imaginary language. It's linked nowhere on my site, and maybe 3-4 places on the entire web. Said MSNBot began to pull PDF after PDF through the script, in addition to other large files, it also tried mailto: links. All in all said spider pulled about 1GB of data in a single day. My site's previous average was about maybe 300-400MB a Month. Let's just say that entire M$ IP Netblock was quickly filtered through iptables.

--
Your hair look like poop, Bob! - Wanker.

Re:Arg I hate M$ by Anonymous Coward · 2004-11-11 20:51 · Score: 0

This has happened to me too; on very low traffic sites, I noticed a surge in traffic from 'msnbot' starting about 2 months ago. I quickly blocked it with robots.txt.

It also showed me the difference between the crawlers. MS's is very dumb and doesnt know, for example, when pages are dynamically generated (eg by a gallery program, a wiki, etc.) Google's, on the other hand, seems to be a mature enough 5 year old to know what is worth crawling and what isn't. (And, for that matter, what is polite to crawl and how quickly.)

Highly unlikely by David+Leppik · 2004-11-11 08:28 · Score: 3, Insightful

Google keeps track of IP addresses and blocks which are doing an unusually high number of searches and disables requests from them.

How do I know? Because a friend of mine decided to find out how common all TLAs are (three-letter acronyms) by counting Google hits on each TLA. This was before the Google API, so he did it with good old fashioned HTTP/HTML. It didn't take long for Google to flag him as evil and block access from his IP block.

Sure, Microsoft could find some way around this-- using different enough IP addresses to conceal the source-- but that's more trouble than it's worse. Worse yet, it sets up a cat-and-mouse game and keeps M$ dependent on Google-- when their stated goal is to beat Google at its own game.

I've got a simpler explaination for what the author is seeing. His evidence is based on the fact that some pages being requested exist only in Google's cache. Well, spiders are supposed to do breadth-first searches so they don't hit the same site too often. Microsoft is probably going against data it collected a few weeks ago but hasn't put on its public servers yet. (Why not? Could be lots of things. Maybe they haven't put enough hardware on the front end to support the amount of data they have on the back end. Or maybe they're just slow.)

As much as I'd like to bash M$, there's nothing here that really looks suspicious to me.

Re:Highly unlikely by Anonymous Coward · 2004-11-11 09:33 · Score: 0

Sure, Microsoft could find some way around this-- using different enough IP addresses to conceal the source

I heard that Microsoft has released an OS that allows them to install hidden software functioning as an open proxy. It seems to be a quite popular OS. Large numbers of open proxies are available for exploit.

Bad Searches by hhawk · 2004-11-11 08:30 · Score: 1

It's also clearly still a BETA product

searching for habs@panix.com my email since '89 turns up NO results, but the following string without the quotes yields resutls

"habs panix.com"

--
http://www.hawknest.com/

Let 'em... by chuckw · 2004-11-11 08:34 · Score: 1

I say let 'em. If the best Microsoft can do is bite at the ankles of the big dogs, they won't last in this area. That being said, eventually Microsoft will surpass Google, if only because they have an infinite amount of resources to throw at the problem, if google ever stops to rest on their "laurels".

--
*Condense fact from the vapor of nuance*

Re:Put down the PS2 controller by meabolex · 2004-11-11 08:35 · Score: 1

I don't own a PS or a PS2. However, I do own a Super Nintendo (: Jackass.

--
FORTUNE FAVORS IRONY

a HA! by Anonymous Coward · 2004-11-11 08:36 · Score: 0

Now we know the real reason for all the Windows/IE security holes.
MS plans to turn the universe of trojan/worm/virus infected machines into Google surfing, MS search enhancing zombies.

You bastards!

Not quite by SamMichaels · 2004-11-11 08:36 · Score: 3, Insightful

Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own.
My garbage doesn't have a copyright statement, contain my patented technology, nor does it come with terms of service or licensing agreements.

Re:Not quite by Control+Group · 2004-11-11 09:09 · Score: 1

And now you know that it probably should.

--

Reality has a conservative bias: it conserves mass, energy, momentum...

litigious bastards search by Savet+Hegar · 2004-11-11 08:37 · Score: 1

While it is hilarious that SCO takes #1

This isn't by accident.....

http://www.litigiousbastards.com/

--
Mod points are pointless when you browse at -1.

Crawling GoogleADS? by IASmaster · 2004-11-11 08:38 · Score: 1

1. Almost none of my googleADS receive valid hits from GoogleADS content.
2. My content ads spiked recently.
3. MSNbot started crawling like mad at about the same time.
RESULT: Make your own conclusion, but it appears MSN doesn't care jack about the little guy.

--
There's no place like ~/

Re:Crawling GoogleADS? by IASmaster · 2004-11-11 08:57 · Score: 1

I meant to say that this is for content-based ads. Think googlesyndication.

--
There's no place like ~/

MSN search slashdotted? by Anonymous Coward · 2004-11-11 08:47 · Score: 0

Every time I try to do a search all I get is "The server is currently unavailable" Not a good start if you ask me... :)

Some serious benchmarking by Eudial · 2004-11-11 08:47 · Score: 1

Google image search: Ugly
Result: some seriously ugly people

MSN image search: Ugly
Result: no ugly people!?!!

Google search: (my IRL name)
Result: all of my USENET posts, my blog, all my sites

MSN search: (my IRL name)
Result: some of my USENET posts

Speed: Google kicks the living yehaa out of MSN.

Result: MSN sucks.

--
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!

Just Speculation by Flamesplash · 2004-11-11 08:48 · Score: 1

I know he says to take it with a grain of salt but I don't buy his evidence at all.

He says that only pages google returns are getting hit. This only means that google is the only one to actually return those pages. It is perfectly reasonable the MSN has pages that it doesn't return but might use in future crawls. Maybe it realizes the pages don't exist and therefor doesn't return them but tries a couple times to see if the page comes back.

As for pages showing up after someone searches them, maybe they spider specific sites that are searched for that it doesn't have. It's called lazy loading and has been around forever.

Maybe MS is benchmarking itself against google, though I'm sure if they did it with any serious amount of queueries google would complain about the bandwidth hit.

--
"Not knowing when the dawn will come, I open every door." - Emily Dickinson

Hardly inexplicable by guet · 2004-11-11 08:51 · Score: 1

which inexplicably renders horribly

Hardly inexplicable if you look at the HTML source for Slashdot, it's an unholy mess.

Re:Hardly inexplicable by geoffspear · 2004-11-11 08:59 · Score: 1

Ok, fair enough, but the fact that reloading the same page almost always makes it render right is a bit disturbing. It could at least be consistently horrible HTML.

--
Don't blame me; I'm never given mod points.
Re:Hardly inexplicable by prandal · 2004-11-11 09:06 · Score: 1

It's fixed on the trunk, so will work properly with Firefox 1.1 when it comes out.

Interesting by Anonymous Coward · 2004-11-11 08:56 · Score: 0

Ok, so maybe this will be a dumb comment, but I don't mean for it to be a troll...

Why doesn't MSN license Google's engine? I mean, certainly they could work out some nice licensing terms that would avoid putting a link to Google up on their page. They'd still own the eyeballs of probably the majority of people who install Windows and just fire up IE. MSN ads could still be served up to those people, and they could still be directed anywhere that Microsoft wants them to. This saves them the development expense not just for the search engine but also the support infrastructure...

Anyway, this I think would definitely be in Google's interest. Why not in Microsoft's?

Don't block them! by gosand · 2004-11-11 08:59 · Score: 1

All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.

Why block them? Just reverse its returned results for any MSN site. Call it anti-leech-technology.

--

My beliefs do not require that you agree with them.

Re:Don't block them! by NuclearDog · 2004-11-11 09:15 · Score: 2, Funny

Or just return a bunch of fake links:

"Madame X's House of Leather"
"Hot slutty teens!"
"Wet & Wild College Girls!"

Etc.

Microsoft would stop leeching REAL quick.

--
This statement is forty-five characters long.

What? by Trogre · 2004-11-11 09:00 · Score: 1

Is Microsoft finally Crawling down the Gurgler?

Oh, never mind...
Maybe next year

--
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife

The Question is... by Fizzl · 2004-11-11 09:01 · Score: 0, Redundant

So what?

--
Bot Assisted Blogging

Um by Rie+Beam · 2004-11-11 09:06 · Score: 1

Great. Another problem that can be solved with an .htaccess file. Woo.

MSN spidering Google -- definitely by AaronD12 · 2004-11-11 09:08 · Score: 1

I tried searching for "Hello Project" on MSN's search yesterday and got very few of the results I was interested in (a JPop conglomeration called the "Hello Project"). Strangely today, when I tried MSN's beta search, the returns were nearly identical to what Google's were. Too identical to be a coincidence in my opinion, especially since there aren't going to be very many people searching for the Japanese "Hello Project". -Aaron-

Why bother crawling google at all? by Specks · 2004-11-11 09:13 · Score: 1

Sure Google has a super huge index of the internet but Google uses DMOZ as a starter to get its bots going. Now I would expect that it probably doesn't have to do this anymore since it can rely on its own data and people now submit links directly to them. DMOZ's directory is freely available for download so why wouldn't M$ use that as a jump off point?

--
Specks
Batteries not included

Microsoft's bug under your bed sheets by Anonymous Coward · 2004-11-11 09:13 · Score: 0

...crawling up your a*s, Bill Gates will never stop!

And then MS vaporizes by MooseByte · 2004-11-11 09:16 · Score: 1

"Well since Microsoft has patented TCP/IP it would be obvious who they bought the Internet from."

:-)

Think of the ensuing amusement as MS secretly funds SCOogle in order for them to sue the patent holder of TCP/IP for infringment.

Suddenly MS immolates in a blazing ball of fire as $40 billion goes up in a spectacular lawyer-fanned pyre of suing and countersuing themselves into oblivion with all the invevitability of a spent massive star collapsing in on itself.

Can we feed it false data? by Anonymous Coward · 2004-11-11 09:19 · Score: 0

If a Microsoft crawler came crawling around my web site, and I could detect it as such, I would much prefer to feed it false data. Perhaps a page tree of porn site links. Or maybe revolving links to Microsoft criminal invesitigations to get their hit rates up. Feeding a bogus set of pages to their crawler could indeed be fun.

Why can't google similarly choose to present false or misleading results when queried by their spider? Indeed, it would be extremely fun if we can finally get cowboy niel and goatsx the page hit rating they deserve on msn!

Easy for Google to combat by Retired+Replicant · 2004-11-11 09:19 · Score: 1

If I were Google, I'd just find out the known IPs that Microsoft crawls from, and then have a special script waiting that will provide all kinds of fake URLs with randomly generated gibberish content in them. That should destroy the quality of the Microsoft search results really quick.

LET'S SPIKE THE RESULTS! by relaxrelax · 2004-11-11 09:19 · Score: 1

Let's spike the results ourselves and submit each "spiked" page to only one search engine so we know which engine copies which other!

This would help us weed out complete parasites.

--
Microsoft is pure dog-ma. FreeBSD is pure cat-ma.

metacrawler by meowsqueak · 2004-11-11 09:22 · Score: 1

Much like how metacrawler work(s|ed?). I found that quite powerful in it's time.

You could "fix" Googlebot by bill_mcgonigle · 2004-11-11 09:24 · Score: 1

My recent fighting with Googlebot has come to a head when I had to disallow them access to my gallery completely because they refused to honor anything except Disallow: /. I had to go so far as to point Googlebot at my robots.txt and tell it to remove all the previous links.

If you do want to be in the Google index, you could write a script to turn robots.txt into a set of RewriteRules and match on GoogleBot. Yeah, a bit of CPU on your part, and unfortunate, but perhaps a reasonable compromise.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Re:You could "fix" Googlebot by garcia · 2004-11-12 01:45 · Score: 0

Oh I tried this. Rewriting it to point to a small index.html that had NOINDEX in the meta tags. All it did was "piss off" Googlebot causing it to repeatedly try and hit that file over and over again.

I even tried to redirect permanently to the same index.html for Googlebot which produced the same result.

The only way to do it was to completely disallow Googlebot from the gallery.

Same Story, different Century by bill_mcgonigle · 2004-11-11 09:27 · Score: 1

Cartographers used to do this, so they could tell when people would steal their maps. An extra fake street here or there...

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Yawn by ad0gg · 2004-11-11 09:27 · Score: 1

Yahoo and Google are the only two search engines that don't supplement their indexes with other search engines results. Look at alexa, its google results with some tweeks(people say its biased towards amazon, i don't know I don't use it), look at ask jeeves which seems to be google and yahoo. Or the 100s of DMOZ clones. Check out this chart.

Search Engine Chart

--

Have you ever been to a turkish prison?

Re:Yawn by Anonymous Coward · 2004-11-11 09:37 · Score: 0

Well, technically, the Google Directory is based on DMOZ.

Uh, this is not a big deal. by jcuervo · 2004-11-11 09:28 · Score: 1

Search engines sell their results all the time. One engine I'm familiar with actually pulls from several other engines when you run a search.

Google and MSN probably have a deal worked out.

--
Assume I was drunk when I posted this.

Bob may have been a ripoff of Magic Cap by Anonymous Coward · 2004-11-11 09:30 · Score: 0

General Magic developed an animated desktop for their OS, MagicCap, which used over reaching 'object' and 'place' metaphor with cartoony characters.

Those of us involved with MagicCap at the time thought Bob felt derivative and even less useful.

So keep looking for that Microsoft innovation...

Re:Bob may have been a ripoff of Magic Cap by oGMo · 2004-11-11 10:06 · Score: 1

Yeah. This has some irony involved here obviously, because MS Bob is considered one of their original inventions, yet agents were hardly a new concept.

Maybe it gets "original invention" status because they didn't buy it from someone else.

Did they?

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

It's cheap! by bs_02_06_02 · 2004-11-11 09:32 · Score: 1

Think about it. It's a classic MS idea!

MS, instead of doing their own work, gets some scripts together to parse google searches into "MS" searches, and they declare a profit, celebrate, watch the stock price rise, and go home happy.

When Google figures out how they're doing it, they get their lawyers, MS gets their lawyers, they meet in court, they settle, MS still makes a profit, Google gets a little something for the effort, and MS still comes out ahead.

20 years of this behavior, and people haven't figured it out. MS put the screws to WordPerfect for 10+ years, and we all saw the settlement from a few days ago. $536 million? That's chump change when you figure out what Office made in the past 10 years.

--
-- No sig for you!

What the bloody .. by pxnoll · 2004-11-11 09:33 · Score: 1

Perhaps this http://www.google.com/microsoft.html spiked them?

Identity Crisis by Anonymous Coward · 2004-11-11 09:33 · Score: 0

I used Win2k to run Firefox to view MSN Search and searched for Google. Now my PC keeps making sheep noises.

If bad jokes aren't your thing, ignore my post by Anonymous Coward · 2004-11-11 09:34 · Score: 0

A man decides he's going to set up his new dry-cleaning shop next door to a nunnery, so as soon as it is finished he goes and asks the Mother Superior if she has any dirty habits.

Enjoy.

Lock up your trash cans... by SphericalCrusher · 2004-11-11 09:38 · Score: 1

"Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

Yeah, that's very well true, but doesn't it make that person digging through your trash seem stupid and poor? I'd think that it would make Microsoft look the same. And a bad reputation for a company like that may not hurt them (since they already have one), but it sure can damage their little project (the search engine).

--
"Instant gratification takes too long." - Carrie Fisher

Other industries do this... by DogDude · 2004-11-11 09:39 · Score: 1

...specifically, the map-making industry. Every roadmap you look at will have at least a few made up roads or landmarks that are used for the sole purpose of catching other companies copying their maps verbatim.

--
I don't respond to AC's.

Garbage. by Anonymous Coward · 2004-11-11 09:46 · Score: 0

Actually, in civilised countries it is illegal for anyone, except for the actual garbagemen, to even touch your garbage.

Never pay retail by Doc+Ruby · 2004-11-11 09:51 · Score: 1

Google offers a limited-scale usage model to "retail" customers (end users) for free. MS is allegedly consuming Google at a larger scale, for "resale" to end users: that's "wholesale" info consumption. Either Google will limit MS to retail use, like everyone else, or they'll let anyone mine their expensive index. Hopefully they'll stay competitive by exercising the latter, possibly by countermining MS, and improving the Google index in direct comparison to the MS quality. That would be like P2P server competition - sounds like a winner for consumers at every level, except MS.

--

--
make install -not war

Considering MSN uses Yahoo results by ad0gg · 2004-11-11 09:53 · Score: 1

MSN uses yahoo results. Plain and simple for everything.

MSN Results
Yahoo Results

Noticed they are the same. Why was this article even posted.

--

Have you ever been to a turkish prison?

Re:Considering MSN uses Yahoo results by Apotsy · 2004-11-11 09:59 · Score: 1

They only fall back to reporting Yahoo results when their own results are not available.

--
Free Hans!

Try this by Anonymous Coward · 2004-11-11 09:54 · Score: 0

Go to the MSN search engine and type "bill gates is an idiot" (include quotes). Then try the same in Google .

Re:More lies from cowardly trolls by Asphalt · 2004-11-11 09:59 · Score: 2, Informative

They do profit from your data. However, being that it is publically available on an HTTP server, that's pretty much their right. That's like you handing me $5 for me to tell you which magazines you might like to buy.

And MSN crawling Google's site is really no different. As long as the Google data is on a public server, it is fair game to crawl.

Finally, MS gets something right by Anonymous Coward · 2004-11-11 10:00 · Score: 0

visit their new search engine and type in: the best operating system, then hit search :-)

what ridiculous logic... by the-build-chicken · 2004-11-11 10:01 · Score: 4, Funny

microsoft is looking at old pages, google uses a cache...ergo microsoft must be using google.

if we're going to use that kind of logic, I could just as easily come up with "afghanistan is in the middle east and supports terrorist, iraq is in the middle east...ergo, iraq must support terrorists", and use it to make a case for invading iraq...but you don't see......oh wait

Re:what ridiculous logic... by KD5YPT · 2004-11-11 11:25 · Score: 1

Actually the logic is this.

Said website (say www.abc.com) once had a page (say www.abc.com/foobar/index.html) that got removed, and was no longer linked.

Said link was remove only recently.

Google's SiteFind (site: command) is the one that can find the link www.abc.com/foobar/index.html (Google cache).

Searching for www.abc.com returns nothing on MSN search.

An IP registered to Microsoft suddenly starts accessing said website. And attempts to retrieve the URL www.abc.com/foobar/index.html that only exist on Google (or maybe Yahoo) in a fashion that indicates a crawler.

After the IP disconnects, MSN Search now contains the fully indexed www.abc.com that wasn't there before the crawler activity.

I say its a pretty good assumption to make that MSN Search might have used Google to crawl other website.

Keyword: Assumption and Might, he's merely forming a theory base on the behaviors of a crawler he observed. It's a good case since it is NOT possible for a crawler to try to access an unlinked page (but previously linked) unless it acquire the specific URL from other source. In this case, the logic is since Google is the only search engine that still contains said URL, MSN bots might be using Google.

--
In US, you can easily buy enough major firearms to wipe out your neighbourhood but a few little fireworks are banned.
Re:what ridiculous logic... by Blitzenn · 2004-11-12 02:27 · Score: 1

Wow is that a convoluted set of logic or what. It doesn't even make sense. Did you read what you wrote?

Doesn't it make sense that if you search www.abc.com on msn and it returns nothing that they would go and crawl the site themselves? How does google even enter the picture at that point? Your arguement is full of holes dude. If Microsoft crawls a site that Google crawled before then MS msut be infringing on google somehow? What kind of nonsense is that?

They don't need to do it... by Anonymous Coward · 2004-11-11 10:06 · Score: 0

It would be absolutely unnecessary for M$. All they have to do is use some of the existing web directories like ODP, Yahoo, Skaffe, JoeAnt, etc as seed and they can find their way to a lot of webpages.

Remember by xihr · 2004-11-11 10:09 · Score: 1

Even if a spider is scanning URLs that were obtained from Google, that doesn't mean that it's being done institutionally by Microsoft. It could just simply be someone fiddling with something.

Sue them for billions by Anonymous Coward · 2004-11-11 10:22 · Score: 0

Hopefully Gooogle has a good licence, which allows them to sue MS into bancruptcy.

Re:More lies from cowardly trolls by Anonymous Coward · 2004-11-11 10:26 · Score: 0

But then garcia would have nothing to bitch and whine about. All his post is really doing is to try to get people to click to his site.

SUE THE BASTARDS by Anonymous Coward · 2004-11-11 10:27 · Score: 0

yep

Garbage can be copyrighted by siskbc · 2004-11-11 10:29 · Score: 1

Maybe yours doesn't - but theirs did.

--

-Looking for a job as a materials chemist or multivariat

google doesn't allow bots to crawl google.com... by lixlpixel · 2004-11-11 10:33 · Score: 2, Informative

see their http://www.google.com/robots.txt robots.txt

so if the msn bot does what they say it doesn't do what it's supposed to do.

THOSE DAMN REPUBLICANS! by Anonymous Coward · 2004-11-11 10:33 · Score: 0

yep again

Try this term on MSN search by Anonymous Coward · 2004-11-11 10:37 · Score: 0

Retaliation from google in 1999?

Best one yet... by Anonymous Coward · 2004-11-11 10:38 · Score: 0

Search for "more evil"

http://beta.search.msn.com/results.aspx?q=more+e vi l&FORM=QBRE

First result: Microsoft's home page.

Don't worry man... by stor · 2004-11-11 10:40 · Score: 1

...you'll be indemnified. It's cool.

Cheers
Stor

--
"Yeah well there's a lot of stuff that should be, but isn't"

The Old Encyclopedia Trick by Nom+du+Keyboard · 2004-11-11 10:44 · Score: 1

The Old Encyclopedia Trick (once famously used in a Fred Saberhagen "Berserker" story), also used by map makers and mailing list sellers, is to plant a few completely false stories (or streets, or addresses to company auditors) in your publication.

Afterwards if you suspect copyright infringement (as in they used you as their source, rather than going out and doing all the original work you had to in the first place), you take them to court as follows:

Defendant: No Your Honor, we did not use their copyrighted material to write/draw our own encyclopedia/map. We went out to the same original sources as they did.

Prosecution Attorney: Then pray tell us where you found the original data for this particular article/street.

Defendant: Uh...

Judge: Guilty!

Google has created a new, original derivative work in the process of how they created and organized their database. They should not be required to open it up to known competitors in the process.

I do really wonder however how Google can't be aware of Microsoft IP addresses recently accessing large chunks of their data. Or is MS using stealth IP's not directly registered to them?

Do you suppose Google itself has a robots.txt file?

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."

It's not hard at all for MS to get away with it! by cjsnell · 2004-11-11 10:53 · Score: 1

Uhm, that's assuming that MS is directly querying Google.

Who says that they haven't put a transparent proxy in front of unknowing MSN customers, capturing the results of those customers' queries. Not terribly difficult to implement and as a side effect, you are looking at a set of search terms that are most popular with your users--ones that MSN Search may want to improve.

NO WAY by puremisery · 2004-11-11 10:58 · Score: 1

Microsoft would NEVER do anything unethecal! Anyway, 300 websites should be enough for people to search from.....

--
-- "Life's not fair, but the root password helps."

this is so funny by koan · 2004-11-11 11:08 · Score: 1

I typed in "lostpacket" (my domain) and the first thing that pops up is a slashdot comment I made

" ... Insightful) by koan (80826) on Thursday September 16, @02:50PM ( #10269238 ) ( http://www.lostpacket.net/ ) Thank you M$ you just gave me the "final straw" to migrate to Linux.
"

--
"If any question why we died, Tell them because our fathers lied."

Easy workaround by diablobsb · 2004-11-11 11:12 · Score: 1

what google has to do is:
run test queries no one would ever do on msn, check their logs for IPs / masks

then write some code to:

1) send results to any query coming from these ranges properly, but with the url link changed to
goatse or tubgirl/etc/etc.... ... randomly...

imagine the news:

"msn search was hacked! shows "tunnel man" as result for searching "the vatican"

2) Have an internal google staff meeting on a big room with a huge tv screen showing the news channels...

3) Laugh a lot

4) profit!

--
I for one, welcome our new hot grits... PROFIT!

That's the OLD MSN search. by Anonymous Coward · 2004-11-11 11:31 · Score: 0

We're talking about the beta of their new ones, which uses homegrown technology.

The current search.msn.com is still using Inktomi technology (same as Yahoo).

But beta.search.msn.com has the new stuff.

Re:google doesn't allow bots to crawl google.com.. by Anonymous Coward · 2004-11-11 11:33 · Score: 0

some interesting urls there, i didn't know about

http://www.google.com/catalogs?q=slashdot Google Catalog search...

Interesting... by Anonymous Coward · 2004-11-11 11:45 · Score: 0

Microsoft's own discription of their ranking scheme is quite revealing:

The MSN Search ranking algorithm analyzes factors such as page contents, the number and quality of sites that link to your pages, and the relevance of your site's content to keywords.

(Emphasis is mine.)
Sounds an awful lot like PageRank, doesn't it? It's not as if they've ever succeeded by innovating.

Missing the entire point by node+3 · 2004-11-11 12:14 · Score: 1

I can write a search bot today that completely ignores it and there is nothing wrong with that (except perhaps ethically but even that is arguable)

"Ethically" is the whole point.

You could just as well have said, "I can code an RFID key that will gain me access to your hotel room and there is nothing wrong with that (except perhaps ethically but even that is arguable)"

Hell yeah it's arguable, that's exactly what this story is all about!

Re:Missing the entire point by mollymoo · 2004-11-11 13:32 · Score: 1

You could just as well have said, "I can code an RFID key that will gain me access to your hotel room and there is nothing wrong with that (except perhaps ethically but even that is arguable)"

You have oficially been sucked in. You are equating the rights granted by copyright/patents/trademarks [delete as appropriate] with physical property. They are not the same.
Enjoy your time in 1984, but try to drag me there and I'll fight your stupid ass till you have me shot as a terrorist.

--
Chernobyl 'not a wildlife haven' - BBC News
Re:Missing the entire point by node+3 · 2004-11-11 16:10 · Score: 1

You have oficially been sucked in. You are equating the rights granted by copyright/patents/trademarks [delete as appropriate] with physical property.

Like hell. Read what I wrote. The original poster was taking the amoralist view. Specifically, that if something can be done, then it's allowed (parenthetically, he said the morality of the action is another issue).

What I'm saying is that the issue is *entirely* moral. No one doubts that it *can* be done, but should it? I never implied that it should or should not, but that the moral question *is* the question.

More specifically, Google *can* (technically and perhaps legally) ignore robots.txt. But if I put up a site and use a robots.txt to state "Index my site but please do not index these pages", should Google abide my wishes?

The answer (I believe) isn't so clear. For example, what am I excluding? Am I excluding a dynamic database that will bog down my server if it's spidered, or a dynamic set of pages that are unique each time and if they are spidered will just link to dead urls? Or am I hiding financial reports that I might wish to later change without notice?

Or maybe it's some personal info that I don't mind making publicly available but that I don't want cached for eternity?

Or are you saying all data should be freely copiable for any reason that does not include physical damage? If so, why not type your credit card number, pin, security code, home address, sexual history and fingerprints into google?

I assume that you are not saying this data should be freely available (although I understand the idea that if it was available equally from everyone, it probably wouldn't be as much of an issue).

On the other hand, if we are talking about a business or government site, or a robots.txt that's just trying to enforce copyright on publicly available pages, I agree with the notion of screw 'em. If they put it up, it's out there.

They are not the same.

I never said they were.

Enjoy your time in 1984, but try to drag me there and I'll fight your stupid ass till you have me shot as a terrorist.

I follow your logic, but you've taken a huge leap in the wrong direction here. I'm always on the giving end of a sentence like that, not the receiving end.

I don't think microsoft is crawling google by mpcooke3 · 2004-11-11 12:51 · Score: 1

MSNBot over the last year or so has started pulling content at a rate so fast many webmasters on PAYG bandwidth have had to block the crawler.

Also when I complained to the research guys I got a reply back dead quick about how my virtual hosts were misconfigured and one domain wasn't configured with a robot.txt. This is compared to the normal response from a microsoft monkey after 2 weeks saying "please try to reboot" that you normally get.

My guess is that Microsoft are throwing money at the new MSN search and have a good research team on it. Assuming that the archiving/indexing is working well MSNBot is now indexing much faster than Googlebot and will catchup or overtake Google if it continues like this. Perhaps it is a key part of the Longhorn-search integration?

MS Search? by windowsfree · 2004-11-11 12:52 · Score: 1

I'm sceptical of the MS search. Try a search for x-windows...

Google Policy by Radioactivo_985 · 2004-11-11 12:54 · Score: 1

Doesn't Google has a policy or some sort of thing so you cannot use their results for your search engine?

Natural for Microsoft by WebCowboy · 2004-11-11 12:56 · Score: 1

What else can you expect? Microsoft invented the "embrace and extend" strategy and they are simply applying that to their search database by scraping Google and tacking on their stuff.

Where it gets sinister is when they appear to "steal" the content. What if the Google page has meta tags or a robots.txt rule specifying the page should not be indexed? What if the terms of service for Google do not permit the unauthorised re-branding or other use of their search results (they do in fact)? Wouldn't you say it's wrong for MS to send the MSNBot a crawlin' then?

I dunno...maybe there isn't a stong defence for Google, but if this is true somehow it seems unethical to build a business on the heavy investment of time and money of others against their will. If they wanted to build on others' work they should use Open Source material like Google did in developing their technology. If they need content they can hunt for it themselves, or get it from a similarly free source (embrace and extend DMOZ directory for example).

I believe in the philosophy of "Free software" and all but I also think that in general you should respect the wishes of others and Google I'm sure wouldn't like this. I also think it would be a bit hypocritical of MS if it did this given its position on sharing its own IP.

Copyright is for CREATIVE works. An automated compilation of other people's works doesn't qualify for copyright protection.

What you say goes against a lot of things previously held by companies about search results, notably that the treat them "hands off". In other words, if Google says they put enough effort into creating their search results they also then can be held responsible for the content. Instead, Google takes the other tack, that they only return links to other people's works.

Re:You've lost sight of what we're talking about?? by Anonymous Coward · 2004-11-11 13:31 · Score: 0

They aren't 'doing as they please' with your content.

They are linking to your content. No one is stealing your work.

In addition, Google has nothing to say because databases of public information cannot be copyrighted. Their search engine is like a phone book. They may have a case under 'sweat of the brow' but since a spider when out and did the work for them, or the site was submitted to them...

The only other thing they can argue is trespass, and it's obviously not that since it's wide open to the public.

Re:You've lost sight of what we're talking about?? by mollymoo · 2004-11-11 13:57 · Score: 1

They aren't 'doing as they please' with your content.

Reproducing a work in its entirety goes way beyond fair use. I was talking about the Google cache, if you bothered to read.

They are linking to your content. No one is stealing your work.

Of course they aren't stealing it, that's not possible, it's not property. However, in the specific circumstances we were discussing Google would be reproducing protected works in violation of the legal protections afforded by copyright law.

--
Chernobyl 'not a wildlife haven' - BBC News

More conspiracy theories.. by Anonymous Coward · 2004-11-11 13:58 · Score: 0

Why is it that the image search on new msn search when set to turn off filtering, yeilds less nudity and explicit material (almost none) than the google search?

MSN

Google

Works for Me by Anonymous Coward · 2004-11-11 14:00 · Score: 0

FF 1.0 PR and all lower versions have never had trouble with Slashdot. And I'm not new here.

I am using it on Linux, however.

Microsoft haters... (sigh) by Anonymous Coward · 2004-11-11 14:15 · Score: 0

It's amazing how a group that calls it self "NERDS" are hating on a nerd himself, who nerdily successfully created what all of you nerds strive to create: A monopoly in the form of a company that provides software that everyone in the world has and uses and/or will eventually have to use ...

Yet you continue to bitch and moan about whether or not Microsoft is stealing this, or isnt releasing that, or Windows is buggy (which I haven't read since XP was released!) ... I don't get it?

What if Firefox manages to take over the world and successfully destroy Microsoft in 5 years. Will you then jump on the Microsoft bandwagon?

Make your minds up. If you're truly nerds, you'd be supporting Gates.

Re:Microsoft haters... (sigh) by Anonymous Coward · 2004-11-17 06:04 · Score: 0

Us nerds are also VERY competitive

dumpster diving vs. ripping off charities by troykoelling · 2004-11-11 14:29 · Score: 1

The stolen trash analogy doesn't hold up. This is more like going down the neighborhood and picking up all the plastic bags left out for Salvation Army, DAV etc. and putting it all on your porch. It's not stealing, but it's still pretty skeezy.

idiotic msn bot by loconet · 2004-11-11 15:01 · Score: 1

I'd like to share these few entries from my website, which is getting raped by the msn bot.

2004-11-09 15:17:56 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 14:25:37 permit-1.0.tar.gz 207.46.98.33
2004-11-09 10:32:15 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 06:25:07 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 06:19:18 permit-1.0.tar.gz 207.46.98.33
2004-11-09 02:51:34 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 02:46:07 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 02:35:36 MultiplyWithMFC-1.0_src.zip 66.249.64.199
2004-11-09 00:55:05 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 00:48:03 permit-1.0.tar.gz 207.46.98.33
2004-11-09 00:10:57 permit-1.0.tar.gz 207.46.98.33
2004-11-09 00:05:21 sync.X-1.0.tar.gz 207.46.98.33
2004-11-08 21:03:10 permit-1.0.tar.gz 207.46.98.33

Note that 207.46.98.33 is registered to Microsoft so lets assume it's the msn bot. Notice that the damn thing blindly keeps on downloading the same file! The requests are just a few minutes from each other. The other ip address is from google's bot.

--
[alk]

Re:idiotic msn bot by harikiri · 2004-11-11 18:54 · Score: 1

I've had bbclone monitoring my blog for the last 3-4 months.
In a few days I've noted MSNbot also "raping" my site (333 hits recently vs 600+ hits from Google across the last few months). It's managed to hit around 25% of robot traffic in the last week or so. I suspect this is due to a poor search algorithm (or duplication of effort by the search algorithm, in case the crawling fails).
Those that are clever may be able to guess the URL to my bbclone stats, to confirm what I'm saying.

--
Man watching 6 MSCE's around a sun box, looks alot like the opening scene's of 2001:space odyssey...

MSN Results are Changing on the Fly by Zastrossi · 2004-11-11 15:30 · Score: 1

In my experience, the MSN Search results changed significantly within 10 hours of my completing a search: First search test. Somebody tries the same queries 10 hours later. I'm not sure what to conclude from that, but clearly some magic is happening behind the MSN curtain.

From article summary by Anonymous Coward · 2004-11-11 15:48 · Score: 0

beef up it's new search engine

"its".

Don't think there's anything Google can do legally by RmanB17499 · 2004-11-11 15:55 · Score: 1

I disagree over legal remedies. Since Google's search is offered for free - with no consideration - then the Terms of Service can't be construed to be a contract. A contract requires offer, acceptance, and consideration to be legally enforced. We can agree on anything - "Hey let's meet for lunch, at noon, by the water fountain." And there's nothing I can do if you don't show up...unless I pay you, or offer something, then you would be in breach. Copyright law protects ideas not the actual piece of paper or the publication. Google's idea can't be duplicated without permission. What exactly is Google's idea? Plus, then one could argue that Google is making money off of everyone else's copyrights without any royalty. Nobody goes to a search engine to just look at results. They want to see the information behind those results. The idea of a web directory or search engine isn't protected. Google's search engine software is copyright protected. But the results of a copyright protected tool don't neccesarily lead to another copyright. Example: I use Microsoft Word to publish a document. Microsoft's copyright does not extend to my document's ideas, but only to the underlying file format. I tend to agree with the phonebook analogy for this case.

ODP by sonictheboom · 2004-11-11 16:03 · Score: 1

when I first became an editor of the Open Directory Project http://dmoz.org/ I soon ran out of sites to add to my category. So I went over to Yahoo and stared adding sites that they had...though I didnt copy the reviews cause that wouldn't be right, apart from copyright issues.

Don't see anything wrong in using other peoples work to build on.

Al Gore's in Germany? by dusanv · 2004-11-11 16:06 · Score: 1

And he's got their briefcase? Probably recounting the cash before he gives them the papers. He smartened up, that Al Gore.

....hmmmmm nice by sonictheboom · 2004-11-11 16:52 · Score: 1

actually if anyone has actually tried to check out the UI, its not bad. The advanced search is not on another page (which is a click and so painful) but a javascripty thing called 'search builder' on the same page. Of course it takes ages to download on a shared dialup but good thinking by MS. (there goes my karma)

Re:....hmmmmm nice by alex_ware · 2004-11-17 05:58 · Score: 1

and your saying that google COULD do this
google measures there page size in bytes, the ms one looks pretty and shiny but that makes big pages

--
If you have nothing useful to say post as AC.

Try the fuck test to see which is better by nysus · 2004-11-11 17:47 · Score: 1

Type "fuck microsoft" into Microshit's engine and then type "fuck google" into Google.

Which search engine is better? You decide!

--

---Technology will liberate us if it doesn't enslave us first.

Greed to the power of N by Anonymous Coward · 2004-11-11 18:14 · Score: 0

I am the richest and I don't want any one else becoming rich. Nor do I want them doing philanthropy.

Amen

No you're wrong. by nmg196 · 2004-11-11 22:03 · Score: 1

You obviously haven't understood what he's claiming. He's not saying that MSN is screen scraping the results - only that it may be using sites found in google as a list of URLs for the MSNBot to crawl. They are still doing the crawling themselves and building their own index.

Mod parent down.

Out to get me! by Anonymous Coward · 2004-11-12 02:17 · Score: 0

Microsoft is out to get me! Stop them! God, I get so sick of the whining here about MS and how they are out to subvert destroy and just fuck everything up. I don't like the compnay or it's products, but god, I don't have dillusions about them being an evil empire that is out to get me. Stop the nonsense. It makes those of us who have a shred of common sense, but still don't like MS look foolish along with the rest of you. Stop the stupid finger pointing and use some sense. MS is a company. They make a product. They want you to use their product so they have to make one better than the next guy to do it. I am sorry to inject some reality here, but they they have been making a better product in the last decade than anyone else. That's why they are so big now. Let's spend our time trying to change that instead of jumping up and down screamming like a bunch of freaking idiots and shaking our fingers at them. That isn't going to change anything at all, except to get others to stop listening to us altogether. Let's show hem progress and they will come. No one wants to be in the same room with a bunch of fanatical screamming morons.

Re:More lies from cowardly trolls by dresgarcia · 2004-11-12 10:02 · Score: 1

It worked on me! I am crawling it and completely ignoring his robots.txt WOOOHOOOOO!

Slashdot Mirror

Is Microsoft Crawling Google?

480 comments