Slashdot Mirror


Is Microsoft Crawling Google?

triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

8 of 480 comments (clear)

  1. Msn Crawling by clinko · · Score: 3, Informative

    If you've been watching the logs to your site lately Microsoft has been RAPING most servers. Most crawlers will pick through pages with large lists 1 at a time, then come back every hour or so.

    MSN starting last week has been pulling EVERY LINK in sequence from my site. Even the larger Artist Index pages of my site.

    Seriously, I've had this same spider on my site for about 36 hours now.

  2. Violates Google's TOS by Anonymous Coward · · Score: 5, Informative
    From Google's Terms of Service
    Personal Use Only

    The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.
  3. Spike the results, then sue by G4from128k · · Score: 4, Informative

    It would be easy for Google to insert a small fraction of non-sequiturs in the results, look at Microsoft's search results, and then sue for misuse. Even if MSFT uses random proxies to avoid detection, it cannot manually recheck all the hits to make sure they are correct (if they could, they had the resources to check all the sites, then they not need to crawl Google. A few made-up sites or inappropriate search hits would be enough to establish a pattern of abuse.

    --
    Two wrongs don't make a right, but three lefts do.
  4. Re:Try this term on MSN search by Garion+Maki · · Score: 3, Informative

    pritty funny :)

    but it seems like google started it several years ago.

    http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/
    and
    http://searchenginewatch.com/sereport/article.php/ 2167621
    btw, it doesen't seem to work on google anymore...

    --
    All indicators show that the human race is selectively breeding itself for stupidity.
  5. Re:Does it violate Google's Terms of Service by nick13245 · · Score: 5, Informative

    Yes it does.
    From Googles Privacy Center (http://www.google.com/terms_of_service.html):

    Personal Use Only

    The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

  6. Re:Don't concern yourself with this crap... by ad0gg · · Score: 5, Informative
    If don't want your site indexed or cached by google. Go here and follow the directions.

    Remove yourself from google

    "Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code. "

    --

    Have you ever been to a turkish prison?

  7. Re:Try this term on MSN search by StikyPad · · Score: 3, Informative

    His thought process probably started here

  8. Re:Don't concern yourself with this crap... by djcapelis · · Score: 3, Informative

    To remove all the images on your site from our index, place the following robots.txt file in your server root:
    User-agent: Googlebot-Image
    Disallow: /

    That should work? No?

    --
    I touch computers in naughty places