Slashdot Mirror


Nutch: An Open Source Search Engine

Anonymous Coward writes "Someone forwarded me this site working to create an open source search engine called Nutch. In the age of weighted rankings on search engines for profits, there's an obvious need for an unbiased search engine. After all, isn't a search engine supposed to be for finding relevant data, not as an indirect and sometimes slimy method of advertising? Nutch is clearly in their intial stages, but it would certainly get my vote." You can find the project on SF.net, and also read the Business 2.0 article on it.

6 of 291 comments (clear)

  1. Patents. by Christopher+Thomas · · Score: 5, Interesting

    I hope the authours of this project do their homework. My impression is that most of the good search and indexing schemes have already been patented, which will make it difficult to release such a project without stepping on someone's toes.

    1. Re:Patents. by AstroDrabb · · Score: 5, Insightful

      Does it matter? There are no innovations. ALL knowledge is based on prior knowlegde. Look in any field of study and you will soon learn that advancement is not possible without prior knowledge. What we know about computer science today is thanks to the knowledge gained by those before us. It is this way in EVERY field, Astronomy, Medical Science, Mathmatics, etc. Humankind does not grow by leaps and bounds, we grow by incremental improvements. I have not heard of ONE discovery/innovation in which the discovery/innovator was not educated in prior knowledge. Now the question we need to ask ourselves, and especially the government is do we really want the advancement of our society to be hindered by monetary interests of the greedy?

      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
  2. Google? by devphaeton · · Score: 5, Informative

    Last i heard google still doesn't accept bribes for page ranking.

    inobtrusive adverts on the right hand column nonwithstanding.

    --


    do() || do_not(); // try();
  3. A Tough Challenge by Cloudmark · · Score: 5, Interesting

    One of the biggest issues with running a search-engine, open-source or otherwise, is that you can't eliminate bias in the results. No matter what scheme you put in place to handle rankings, someone will find a way to take advantage of it. It's a fact of any major system - there's always a way to twist it. Part of the challenge that Google and similar sites face is that they have to work constantly to protect themselves from systems designed to take advantage of their algorithm. While a completely unbiased search service would be nice, I think it would require the impossible. It would require that no one out here took advantage of it to further their own interests, be they political, commercial, or otherwise. That's fairly unlikely.

    With most of the major engines today including Google, they make an effort to prevent horribly unbalanced results (recent controversy over blogs outweighing professional sites in the rankings due to linking and other factors). Some even admit (again, Google does) to manually messing with the rankings a little. If you search for suicide methods, they will bend the engine to make sure you get reasons why you shouldn't commit suicide before you get the how-to. That's in their own public docs. It's also discussed in Wired.

    I honestly don't know if open-source could do a better job. The algorithm might be better (likely, given the manpower), but would it really be that much fairer?

    --
    "Be proud to be a fighter" - Martial Arts Adage
  4. Re:Slimey adverts? by Anonymous Coward · · Score: 5, Insightful

    This project is the SOFTWARE to run a search engine. Not a corporation that needs to generate income to justify the resources required to run the search engine.

    Anyone could take this source code and with enough money, challenge Google.com as the top search engine.

    I see this project as a competitor to shrink wrapped search engines. IE google appliance or maybe even Folio based products. Typically corporations have many documents that need to be indexed and searchable to their needs.

    I haven't seen this on the homepage but it doesn't list what content it can index. I hope it can at least index PDF's and popular Office documents.. Maybe even Media files? And what XML indexed fields? Or external metadata?

  5. Search Engine Monoculture by peachawat · · Score: 5, Interesting

    Why is it that when it comes to OS, everyone is bitching and screaming how bad monoculture created by Microsoft Windows is, but otherwise feeling warm and fuzzy and swear to god Google is and always be the only search engine they use?

    The point is, are you really comfortable to have one, and only one, effective search engine? No matter how well it searches?

    O'Reilly put it best :

    Actually, Nutch has no ambitions to dethrone Google. It's just trying to provide an open source reference implementation of search to help keep Google and other search engines honest, by letting people compare the results of an engine whose algorithms and methodologies are transparent and accessible. It also aims to give a platform for people outside of the search heavyweights to research new search algorithms.