Is Microsoft Crawling Google?

← Back to Stories (view on slashdot.org)

Is Microsoft Crawling Google?

Posted by CmdrTaco on Thursday November 11, 2004 @07:36AM from the put-on-your-foil-hat dept.

triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

15 of 480 comments (clear)

Difficult to do if Google doesn't want them to by Anonymous Coward · 2004-11-11 07:37 · Score: 5, Insightful

All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.
1. Re:Difficult to do if Google doesn't want them to by blamanj · 2004-11-11 08:09 · Score: 5, Interesting
  
  Yes, and don't think Google wouldn't notice. My company had a summer intern that once wrote a program that started sucking a lot of information out of Google. They blocked our entire site for about three days until everything got straightened out.
Yea, and by BrianGa · 2004-11-11 07:38 · Score: 5, Funny

The new search engine's name will be Mooglesoft.
But will this mean Google can crawl back? by biffnix · 2004-11-11 07:39 · Score: 5, Funny

Couldn't Google just crawl Microsoft in return? Then they'd be stuck in an endless loop, and William Shatner can then swoop in, crack some skulls, and save the day.

Or something like that.

biffnix

--
Don't Die Wondering
They been crawling like mad lately by mpost4 · 2004-11-11 07:40 · Score: 5, Interesting

I can say that they been crawling like mad as of late, Google, Yahoo, and MSN. I say this because on my site I have had a lot of traffic from all three, and my site is not a popular, or even an important one but I seen a lot of traffic from them. Not just once a week or a few times a week but every day. There are big updates coming. I was not surprised to see the article about google doubling their index, I know something was coming from the way they are crawling unimportant/unpopular sites.
Try this term on MSN search by bbzzdd · 2004-11-11 07:40 · Score: 5, Funny

more evil than satan

ROOFLES!
1. Re:Try this term on MSN search by JohnnyKlunk · 2004-11-11 07:47 · Score: 5, Funny
  
  OK. This is really freaky. Try
  
  more evil than god and you get FIREFOX as the first result (then google, of course)
Shocked I tell you by finkployd · 2004-11-11 07:41 · Score: 5, Funny

Well, that kind of business practice would be completely out of character for Microsoft.

This is a non-story. A good Slashdot headline will be when they get caught actually NOT doing something like this.

Microsoft Has Original Idea and Implements it By Themselves
From the 70%-of-slashdot-editors-suffered-heart-attacks -reading-this-submission Dept.
Google is Catholic? by TheAmazingBob · 2004-11-11 07:43 · Score: 5, Funny

"Google happily changed its habbits..."

Google is Catholic?

--

The Geek Crew
Violates Google's TOS by Anonymous Coward · 2004-11-11 07:46 · Score: 5, Informative

From Google's Terms of Service

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.
They really only need to seed their crawler... by JustNiz · 2004-11-11 07:55 · Score: 5, Interesting

You can't get to every page on the internet just by starting at one page and recursively following links, therefore the more places you from, the more likely you are to have 100% coverage.

I could imagine that Microsoft just needs a few thousand URL's evenly-spread across the internet just to seed their crawler, which they can get from Google by using a list of most popular queries.

Once their crawler has so many starting points it can do the rest itself.
Re:Don't concern yourself with this crap... by mollymoo · 2004-11-11 07:59 · Score: 5, Interesting

No offense dude, but you are the one who put the site out their publically. Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion.

There's more to it than that. Google caches your pages and makes that cache of your copyright material available. Arguably if you have used your robots.txt file to tell it not to index (and therefore cache) your pages and it still does they are breaching copyright. OK, the Google cache is the world's largest breach of copyright anyway, but if you have told its spider not to index and it does regardless, that's a different ballgame.
Putting it out there on the web does not give anyone the right to do with it as they please.

--
Chernobyl 'not a wildlife haven' - BBC News
Full Circle by Guppy06 · 2004-11-11 08:17 · Score: 5, Interesting

"Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

It's interesting to know that Bill Gates has been forced to go back to his roots...
The best way to prepare [to be a programmer] is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and fished out listings of their operating system.
Re:Does it violate Google's Terms of Service by nick13245 · 2004-11-11 08:22 · Score: 5, Informative

Yes it does.
From Googles Privacy Center (http://www.google.com/terms_of_service.html):

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.
Re:Don't concern yourself with this crap... by ad0gg · 2004-11-11 09:33 · Score: 5, Informative

If don't want your site indexed or cached by google. Go here and follow the directions.
Remove yourself from google
"Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code. "

--
Have you ever been to a turkish prison?