Slashdot Mirror


NCSA Issues Disclaimer on Google/Yahoo Study

Jean Veronis writes "NCSA has issued a strong disclaimer on the study announced recently on Slashdot that seemed to contradicted the fact that Yahoo's index size would be bigger than Google's: ' Staff at the NCSA noted several issues with the study'. This study conducted by students is 'not an NCSA publication and was not conducted as part of any NCSA project or under the supervision of NCSA'. "

7 of 118 comments (clear)

  1. But why publish it? by ChrisF79 · · Score: 2, Insightful

    Although they don't say it in the disclaimer, their actions of posting a disclaimer after posting the article screams that they realize the article is flawed. If that's the case, why publish it in the first place? Shouldn't they have had some foresight and left this one on the cutting room floor? Maybe Finance is different, but I remember it being very difficult to get an article published unless it was groundbreaking and free from any minor flaws.

    --
    Finance tutorials and more! Understandfinance
    1. Re:But why publish it? by 'nother+poster · · Score: 3, Insightful

      From the disclaimer I would say thet the report was not a university sanctioned project, but a funtime project for a couple of students. They then published it in a manner that implied that it was offical work of the university, or at least sanctioned by the professor. Now, whether the study is right or wrong come peer review, the university wants it known that it wasn't their project. A peer reviewed research project is much different than throwing together a bad stats class midterm and putting the results on a university server.

  2. Filtering by Spazmania · · Score: 4, Insightful

    Readers can consult the list of search terms provided by the authors, and can see for themselves that, in the vast majority of cases retained (i.e. those with fewer than 1000 results), the results in question are lists and spam.

    I don't know which disturbs me more: The possibility that this is the correct explanation for the discrepancy or the possibility that it isn't.

    It seems to me that the correct solution to filtering results would be to put the "undesirable" results at the bottom of the list, not get rid of them entirely. One man's trash is another man's treasure after all.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  3. Covering Ones Rear by gkozlyk · · Score: 3, Insightful

    Ah, the good old disclaimer added to cover ones rear. With litigation flying free as newspaper in the wind, one can't be to careful these days.

    --
  4. The dark web by SpinyNorman · · Score: 5, Insightful

    The Yahoo vs Google page count methodology of counting numbers of pages returned for various high-response queries seems to be completely ignoring the fact that Yahoo *might be* picking up some of the less highly linked-to "dark web" that Google's page rank alogorithm are going to rate lowly, and which their crawler may be ignoring.

    This is the portion of the web that I'd like to see - not the commerical portion but the hobbyist and enthusiast sites that may be out there without lots of incoming links that would make them more highly rated and/or visible to Google.

    What'd therefore be relevant and interesting to know isn't how many hundreds of pages Google vs Yahoo get for "my job sucks", but rather how many it gets for "my weevil collection".

    1. Re:The dark web by RAMMS+EIN · · Score: 2, Insightful

      ``This is the portion of the web that I'd like to see - not the commerical portion but the hobbyist and enthusiast sites that may be out there without lots of incoming links that would make them more highly rated and/or visible to Google.''

      I personally don't think Google is _excluding_ pages that somehow don't get enough links to them. Typically, good resources will get linked to, and thus taking into account the number of links to a page seems sensible.

      From personal experience, I can't say I have anything to complain about with Google. When I post a new page on my site that includes some word that previously had few hits on Google, it gets to the top of the results within a few days. So, even without many links, the system works. When I search for words that do return many hits, the results I get first are usually the most relevant (provided that I have entered enough words to place everything in proper context; searching for "festival" wouldn't give me the speech synthesis software unless I also included "speech").

      If you are specifically looking for pages that have few links to them, another search engine might be better for you. Or maybe not. Maybe you would be best served by using Google and looking at the last rather than first results. Perhaps it would be a good idea for Google to include an option to invert the ranking?

      --
      Please correct me if I got my facts wrong.
  5. It was not "published" by kaan · · Score: 4, Insightful

    why publish it in the first place?

    Dude, it was never published, it was posted on one web server that is part of the ncsa.uiuc.edu sub-domain (specifically, vburton.ncsa.uiuc.edu). There are probably hundreds of machines that are in this network, and posting something on a web server running there does not equate to NCSA formally publishing an article. What we're talking about here is a web page written by two students, they worked on a project, they wanted to post it for other people to see. So that's what they did, period.

    Stupidly, everyone is claiming that NCSA backed this whole thing, like they (NCSA) are on some crusade to compare Yahoo and Google. But this must be taken for what it is - a project by two students. NCSA's disclaimer is just trying to make this clear for the idiots out there who think that every little thing a student says or does must have been funded, supported, backed, etc. by NCSA.