Slashdot Mirror


Building a Bigger Search Engine

skreuzer writes "Wired is running a story about a distributed web crawler called Grub. People who choose to download and run the client will assist in building the Web's largest, most accurate database of URLs. This database will be used to improve existing search engines' results by increasing the frequency at which sites are crawled and indexed. Conceivably, Grub's distributed network could enable state information to be gathered on every document on the Internet, each and every day."

6 of 278 comments (clear)

  1. Will Grub take off or be smashed? by Blaine+Hilton · · Score: 4, Insightful
    I started to use grub, but then questions started cropping up. First we are using this to further a commercial organization. This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine. There is not even any potential reward such as with distributed.net.

    Also the grub engine crawls everything, including adult content and other questionable content. They have a setting to turn it off, but it does not block it. With the current questioning of international law relating to accessing illegal websites this could have major consequences for the average user.

    So for the time being I have stopped using the grub client until some serious questions are answered. It's an interesting concept and if it was being used in more of an academic setting it could be interesting. However I believe that search engines like Google are doing pretty good themselves.

    Go calculate something

    1. Re:Will Grub take off or be smashed? by kaden · · Score: 5, Insightful

      Um, I think you're missing the point. This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server. I would quite possibly be arrested and charged, and while I wouldn't be convicted, it's quite an ordeal, and there's an ugly social stigma to even being charged with Kiddy Porn or conspiring with a terrorist. So that's a serious question that's posted by running Grub.

  2. Great idea, but will it pan out? by dtolton · · Score: 5, Insightful

    LookSmart hopes to tap the altruistic nature of many Internet users.

    That unfortunately seems like a naively optimistic hope. While the
    vast majority of people may be altruistic, it only takes a few
    unscrupulous individuals to completely undermine a fair result.

    It's interesting that this idea is an extension to Google's model in
    many ways. Essentially Google is able to index so much of the
    interent by having 50,000+ servers. I don't think that's what makes
    Google such a useful search tool, rather I think it's accuracy and
    relevancy. If my search results started getting poluted with bogus
    hits, I would stop using it almost immediately.

    Unfortunately, by letting people run the client on their machine and
    having it send the results back to the server, I think spoofed
    results are inevitable. I don't think it will be possible to
    safeguard the results either, it will be interesting to see how well
    this project survives *when* people start spoofing results. It's
    been a problem for SETI@home, and it's something that undermined some
    peoples faith in the project as a whole. If the spoofed results are
    more widespread and have a larger impact as they would in a system
    like this, it may ultimately prove fatal to the project.

    One factor that has been asbolutely critical to Google's success has
    been their ability to remain resistant to spoofing attempts. It's
    still a question mark how well grub will perform in that context.

    --

    Doug Tolton

    "The destruction of a value which is, will not bring value to that which isn't." -John Galt
  3. Firewalls? by adam_megacz · · Score: 5, Insightful

    So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

  4. A better use for my screensaver time by Call+Me+Black+Cloud · · Score: 5, Insightful

    I prefer grid.org to grub.org. There the cycles are going to cancer or smallpox research. Currently over 2 million machines are participating.

    Altruism has its place, but since I'm more likely to die of cancer than of not having the complete www indexed I think I'll be selfish and work towards a cure for something that may affect me.

  5. Re:Altruistic? by eversunsoft · · Score: 4, Insightful
    Well, because web searching, to this day in age, has been a free service. Supposing that the index is built as the result of donated searches, it would be ethically in very bad taste to act against this trend.

    Of course, I am the first one to question this trend. Has anyone else considered the possibility that one day we'll wake up, and notice that google is charging for access to it's basic searching services?

    I for one, would probably pay. I have become so dependent on it. What price? That's a good question...