Slashdot Mirror


NCSA Issues Disclaimer on Google/Yahoo Study

Jean Veronis writes "NCSA has issued a strong disclaimer on the study announced recently on Slashdot that seemed to contradicted the fact that Yahoo's index size would be bigger than Google's: ' Staff at the NCSA noted several issues with the study'. This study conducted by students is 'not an NCSA publication and was not conducted as part of any NCSA project or under the supervision of NCSA'. "

6 of 118 comments (clear)

  1. Disclaimer Text by Stanistani · · Score: 5, Interesting

    From http://vburton.ncsa.uiuc.edu/indexsize.html:
    "The following study was completed by two of Professor Vernon Burton's students at the University of Illinois. Though one of the students previously worked with Professor Burton at the National Center for Supercomputing Applications (NCSA), the study was done outside the scope of any NCSA core projects. When first published online, staff at the NCSA noted several issues with the study, and some revisions have been made to the document to reflect several of these concerns. Changes are detailed at the bottom of the following page.

    Please note again that this study is not an NCSA publication and was not conducted as part of any NCSA project or under the supervision of NCSA.

    A Comparison of the Size of the Yahoo and Google Indices "

  2. A crucial issue... by d3m057h3n35 · · Score: 5, Funny

    Also pertinent was the discovery that Yahoo's claims to increased index size were based on the hope that buying products from companies which advertise "longer, thicker index size in two weeks, money-back guarantee, all-natural supplements" would yield actual results.

  3. Wait... by lbmouse · · Score: 5, Funny

    I thought that size didn't matter.

  4. Re:/. 503 error by Anonymous Coward · · Score: 5, Funny
    I've been getting 500 errors the whole morning while trying to reach /. But not 503 ones. After one or two page refreshes, it starts working!


    The trick is to refresh as fast as you can, until the bad 500 errors go away.

  5. The dark web by SpinyNorman · · Score: 5, Insightful

    The Yahoo vs Google page count methodology of counting numbers of pages returned for various high-response queries seems to be completely ignoring the fact that Yahoo *might be* picking up some of the less highly linked-to "dark web" that Google's page rank alogorithm are going to rate lowly, and which their crawler may be ignoring.

    This is the portion of the web that I'd like to see - not the commerical portion but the hobbyist and enthusiast sites that may be out there without lots of incoming links that would make them more highly rated and/or visible to Google.

    What'd therefore be relevant and interesting to know isn't how many hundreds of pages Google vs Yahoo get for "my job sucks", but rather how many it gets for "my weevil collection".

  6. Accuracy of Google counts? by xiaomonkey · · Score: 5, Interesting
    Try the following sets of key words on Google: This trend appears to continue, as seen in that repeating the "lawyer" keyword 10 times results in Google estimating that there are 389,000,000 hits in it's index.

    On yahoo, this sort of thing doesn't seem to happen as much, but it still does happen. For example, searching for "laywer" returns 124,000,000 results, and searching for "lawyer lawyer" or "lawyer lawyer" returns 125,000,000 results.

    So, it probably doesn't really make seen to judge the relative size of either index based on the estimated number of hits for any given set of keywords in their index. Right now, Google's numbers look a little more suspect since they seem to variety so greatly just based on the repetition of a keyword. However, the stability of Yahoo's numbers don't necessarily mean that they're correct either.