Slashdot Mirror


Building a Bigger Search Engine

skreuzer writes "Wired is running a story about a distributed web crawler called Grub. People who choose to download and run the client will assist in building the Web's largest, most accurate database of URLs. This database will be used to improve existing search engines' results by increasing the frequency at which sites are crawled and indexed. Conceivably, Grub's distributed network could enable state information to be gathered on every document on the Internet, each and every day."

28 of 278 comments (clear)

  1. Will Grub take off or be smashed? by Blaine+Hilton · · Score: 4, Insightful
    I started to use grub, but then questions started cropping up. First we are using this to further a commercial organization. This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine. There is not even any potential reward such as with distributed.net.

    Also the grub engine crawls everything, including adult content and other questionable content. They have a setting to turn it off, but it does not block it. With the current questioning of international law relating to accessing illegal websites this could have major consequences for the average user.

    So for the time being I have stopped using the grub client until some serious questions are answered. It's an interesting concept and if it was being used in more of an academic setting it could be interesting. However I believe that search engines like Google are doing pretty good themselves.

    Go calculate something

    1. Re:Will Grub take off or be smashed? by kaden · · Score: 5, Insightful

      Um, I think you're missing the point. This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server. I would quite possibly be arrested and charged, and while I wouldn't be convicted, it's quite an ordeal, and there's an ugly social stigma to even being charged with Kiddy Porn or conspiring with a terrorist. So that's a serious question that's posted by running Grub.

    2. Re:Will Grub take off or be smashed? by bcrowell · · Score: 4, Informative
      Do you have any references? Please back up your claims.
      here, and here

      Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.

    3. Re:Will Grub take off or be smashed? by dtfinch · · Score: 5, Interesting

      There are many ways to look at this. The idea is to install the client, set Opera to use the same useragent string, visit some of those sites, then blame it on Grub if the FBI comes busting through your door.

      If you're a criminal, installing the Grub client might be a great idea.

  2. Great idea, but will it pan out? by dtolton · · Score: 5, Insightful

    LookSmart hopes to tap the altruistic nature of many Internet users.

    That unfortunately seems like a naively optimistic hope. While the
    vast majority of people may be altruistic, it only takes a few
    unscrupulous individuals to completely undermine a fair result.

    It's interesting that this idea is an extension to Google's model in
    many ways. Essentially Google is able to index so much of the
    interent by having 50,000+ servers. I don't think that's what makes
    Google such a useful search tool, rather I think it's accuracy and
    relevancy. If my search results started getting poluted with bogus
    hits, I would stop using it almost immediately.

    Unfortunately, by letting people run the client on their machine and
    having it send the results back to the server, I think spoofed
    results are inevitable. I don't think it will be possible to
    safeguard the results either, it will be interesting to see how well
    this project survives *when* people start spoofing results. It's
    been a problem for SETI@home, and it's something that undermined some
    peoples faith in the project as a whole. If the spoofed results are
    more widespread and have a larger impact as they would in a system
    like this, it may ultimately prove fatal to the project.

    One factor that has been asbolutely critical to Google's success has
    been their ability to remain resistant to spoofing attempts. It's
    still a question mark how well grub will perform in that context.

    --

    Doug Tolton

    "The destruction of a value which is, will not bring value to that which isn't." -John Galt
    1. Re:Great idea, but will it pan out? by Nickilo · · Score: 5, Interesting

      "The General's Dilemma" would solve this problem. The story goes something like this: The general needs to get urgent information to one of his officers, however, he suspects saboteurs are present among his messengers. In order to insure the information gets through accurately, he sends the same message with several men. The officer on the other end collects all the messages and goes with the majority. (And, presumably, kills the others.)

  3. Biiig questions to answer by andy@petdance.com · · Score: 5, Interesting
    So Grub goes out, uses bandwidth, and then returns some results to the home base. It's really distributed bandwidth more than distributed computation.

    I bet one of the big successes in Folding and distributed.net is that many people run the clients on work boxes, knowing that there's little actual overhead incurred to their work. How different that is for a URL sucker.

    I wonder what broadband ISPs think of Grub.

    1. Re:Biiig questions to answer by friedegg · · Score: 4, Interesting

      I wonder what broadband ISPs think of Grub.

      If it becomes a problem, I imagine ISPs will declare it a commercial bandwidth usage, and order users to stop or move to a business class plan for more money.

      --
      Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
  4. Haiku :-) by Ignorant+Aardvark · · Score: 4, Funny

    Grub searches the web
    Sniffing out all the good porn
    Not just bootloader

    I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!

    Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called "E-Coli" yet? No? I can just imagine my mom ...

    "Agh! You have E-Coli on your computer!"

    1. Re:Haiku :-) by Anonymous Coward · · Score: 4, Funny
      Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
      I'm wondering if the Grub bootloader developers will throw a tantrum and flood the Grub crawler developers' e-mail addresses, claiming that this will confuse people and harm the bootloader project.

      Hee hee.
    2. Re:Haiku :-) by Unoriginal+Nick · · Score: 5, Funny
      Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...

      How about Firebird? I'm sure that won't cause any problems :-)

    3. Re:Haiku :-) by Chester+K · · Score: 4, Funny

      As time approaches infinity, the number of software projects named Firebird also approaches infinity.

      It's ok though because they'll all still be different projects, so nobody will get confused.

      --

      NO CARRIER
  5. If previous results are any guide by carl67lp · · Score: 5, Funny

    1. Tech-savvy people will install this.
    2. Tech-savvy people tend to be loners.
    3. Loners most often search for porn.

    C1. Tech-savvy people search for porn.

    4. Items searched for most often reach the top of the list.
    5. Porn is searched for often by tech-savvy people.

    C2. Porn will be easier to find with this new search engine.

    Count me in!

    1. Re:If previous results are any guide by anon*127.0.0.1 · · Score: 4, Funny

      You're having trouble finding porn now?

      --
      I am NOT a man!
      I am a free number!
  6. Firewalls? by adam_megacz · · Score: 5, Insightful

    So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

  7. Google Toolbar by petree · · Score: 5, Interesting

    Couldn't google do this anyways with the google toolbar? Cause with the advanced features version it tracks every page you visit. If they offered some incentive to install the toolbar, google could just beat them at this game. I actually use the google toolbar already by choice (it makes my web searching more productive) everyday, all they have to do is get lots of people using it and wouldn't that work just as well or better?

    1. Re:Google Toolbar by Kelerain · · Score: 5, Interesting

      This tracking is actually how a lot of important information leaks out. Security through obscurity has always been a poor mans system, and this busts it wide open. I wont post them here but there are several interesting searches you can do that give personal results for things that REALLY have NO place on a publicly accessable page. On a more positive note, google already uses distributed computing though thier googlebar http://toolbar.google.com/dc/offerdc.html However they donate the cycles to various worthy causes like folding at home (currently thier only benificiary), but it is concevable that if they came up with some secure and usefull search related thing to do with the cycles they could put it to use almost instantaniously. I think that there aren't segnificant benifits (plenty of discussion elsewhere here) for them to want to use it however.

  8. Google's technology is superior... by eidechse · · Score: 4, Funny

    ...those pigeons can't be beat.

  9. Re:Not news for us webmasters by Redwing · · Score: 5, Interesting

    Here is what slashdotters were saying about grub almost 2 years ago.

    --
    Raisinettes are my raison d'etre
  10. They realize they aren't the REAL GRUB by anagama · · Score: 5, Informative

    From the readme in the linux version - no idea what the other readmes might say. However, it appears that they are sensitve to the fact that bootloader grub pre-existed their program. They are requesting catchy names. Here is an excerpt:

    Notice
    ======
    The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

    --
    What changed under Obama? Nothing Good
  11. A better use for my screensaver time by Call+Me+Black+Cloud · · Score: 5, Insightful

    I prefer grid.org to grub.org. There the cycles are going to cancer or smallpox research. Currently over 2 million machines are participating.

    Altruism has its place, but since I'm more likely to die of cancer than of not having the complete www indexed I think I'll be selfish and work towards a cure for something that may affect me.

  12. Indexor or Search Engine? by digitect · · Score: 4, Interesting

    I expected some way to search... this looks more like a project to index the web rather than make the results available for public use via web interface. Did it strike anyone else odd that there was no web form on the home page with which to search?!

    It seems like a good concept, but the availability of the information collected needs to be accessible without installing the client. I'm not game to install distributed computing apps without some freely available benefit. The "for the good of the world" motivation went out the window for me about a day after my first Seti At Home experience. (But now BitTorrent, there was appreciable benefit. I had RedHat 9 isos within 8 hours of their initial release!)

    --
    There is no need to use a SlashDot sig for SEO...
  13. Re:search.msn.com is the future by shibbydude · · Score: 5, Interesting
    In particular, the company has its own team of editors that monitors the most popular searches being performed and then hand-picks sites that are believed to be the most relevant.

    You have to be kidding or working for Microsoft, or both! Have you ever searched for Linux on MSN? Try it - here.

    Notice the third result? "Learn about the Microsoft alternatives and how to move to them from open source products." I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

    --
    We're only gonna die from our own arrogance, that's why we might as well take our time...
  14. Altruistic? by sulli · · Score: 5, Funny
    That's the dumbest thing I've heard in ages. Why should I help out a for-profit company for free?

    (Oh, I can't remember. Have I MetaModerated Recently?)

    --

    sulli
    RTFJ.
    1. Re:Altruistic? by eversunsoft · · Score: 4, Insightful
      Well, because web searching, to this day in age, has been a free service. Supposing that the index is built as the result of donated searches, it would be ethically in very bad taste to act against this trend.

      Of course, I am the first one to question this trend. Has anyone else considered the possibility that one day we'll wake up, and notice that google is charging for access to it's basic searching services?

      I for one, would probably pay. I have become so dependent on it. What price? That's a good question...

    2. Re:Altruistic? by R0 · · Score: 5, Funny

      Notice
      ======
      The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.


      I nominate "parasite".

  15. Re:You can run both by rabidcow · · Score: 5, Funny

    Grub is mainly interested in your excess bandwidth.

    Unfortunately, so is my ISP. In fact, they've already sold it to other customers.

  16. Legalities? by cheshiremackat · · Score: 4, Interesting

    Alright, I have 3 major problems with this...

    1) How different is this than the princton kiddies system? I don't know about you, but I don't want a 95 billion dollar bill arriving in the mail...

    2) What if you local (cache?) contains a few links to kiddie porn? Not your fault, right? Software does it's own thing, you cannot control, BUT what will the FBI think? The FBI Scottland Yard, RCMP are currently heavily investigating Kiddie Porn cases (good work IMHO), but what if your the unlucky sap who getts stuck with a few sketchy URLs? Or Worse Yet, what if this GRUB keeps a cache of the website like google does? Then what?

    3) What about material that is legal locally, but illegial somewhere else... eg. Nazi stuff in Germany, Falun Gong in China, etc... The last thing I want is to be refused to be given a travel visa cuz my PC has an illegial cache...

    Good idea in principle, but with sketchy content on the web, I don't think I will be the one keeping track of it all. If there is a way to filter out the questionable stuff then maybe, but since the purpose is to be as inclusive as possible, it seems incompatible.

    _CMK

    --
    Bad spellers of the world untie!