Slashdot Mirror


Nutch: An Open Source Search Engine

Anonymous Coward writes "Someone forwarded me this site working to create an open source search engine called Nutch. In the age of weighted rankings on search engines for profits, there's an obvious need for an unbiased search engine. After all, isn't a search engine supposed to be for finding relevant data, not as an indirect and sometimes slimy method of advertising? Nutch is clearly in their intial stages, but it would certainly get my vote." You can find the project on SF.net, and also read the Business 2.0 article on it.

3 of 291 comments (clear)

  1. Patents. by Christopher+Thomas · · Score: 5, Interesting

    I hope the authours of this project do their homework. My impression is that most of the good search and indexing schemes have already been patented, which will make it difficult to release such a project without stepping on someone's toes.

  2. A Tough Challenge by Cloudmark · · Score: 5, Interesting

    One of the biggest issues with running a search-engine, open-source or otherwise, is that you can't eliminate bias in the results. No matter what scheme you put in place to handle rankings, someone will find a way to take advantage of it. It's a fact of any major system - there's always a way to twist it. Part of the challenge that Google and similar sites face is that they have to work constantly to protect themselves from systems designed to take advantage of their algorithm. While a completely unbiased search service would be nice, I think it would require the impossible. It would require that no one out here took advantage of it to further their own interests, be they political, commercial, or otherwise. That's fairly unlikely.

    With most of the major engines today including Google, they make an effort to prevent horribly unbalanced results (recent controversy over blogs outweighing professional sites in the rankings due to linking and other factors). Some even admit (again, Google does) to manually messing with the rankings a little. If you search for suicide methods, they will bend the engine to make sure you get reasons why you shouldn't commit suicide before you get the how-to. That's in their own public docs. It's also discussed in Wired.

    I honestly don't know if open-source could do a better job. The algorithm might be better (likely, given the manpower), but would it really be that much fairer?

    --
    "Be proud to be a fighter" - Martial Arts Adage
  3. Search Engine Monoculture by peachawat · · Score: 5, Interesting

    Why is it that when it comes to OS, everyone is bitching and screaming how bad monoculture created by Microsoft Windows is, but otherwise feeling warm and fuzzy and swear to god Google is and always be the only search engine they use?

    The point is, are you really comfortable to have one, and only one, effective search engine? No matter how well it searches?

    O'Reilly put it best :

    Actually, Nutch has no ambitions to dethrone Google. It's just trying to provide an open source reference implementation of search to help keep Google and other search engines honest, by letting people compare the results of an engine whose algorithms and methodologies are transparent and accessible. It also aims to give a platform for people outside of the search heavyweights to research new search algorithms.