Slashdot Mirror


Building a Bigger Search Engine

skreuzer writes "Wired is running a story about a distributed web crawler called Grub. People who choose to download and run the client will assist in building the Web's largest, most accurate database of URLs. This database will be used to improve existing search engines' results by increasing the frequency at which sites are crawled and indexed. Conceivably, Grub's distributed network could enable state information to be gathered on every document on the Internet, each and every day."

278 comments

  1. Will Grub take off or be smashed? by Blaine+Hilton · · Score: 4, Insightful
    I started to use grub, but then questions started cropping up. First we are using this to further a commercial organization. This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine. There is not even any potential reward such as with distributed.net.

    Also the grub engine crawls everything, including adult content and other questionable content. They have a setting to turn it off, but it does not block it. With the current questioning of international law relating to accessing illegal websites this could have major consequences for the average user.

    So for the time being I have stopped using the grub client until some serious questions are answered. It's an interesting concept and if it was being used in more of an academic setting it could be interesting. However I believe that search engines like Google are doing pretty good themselves.

    Go calculate something

    1. Re:Will Grub take off or be smashed? by dubiousmike · · Score: 1

      on thing might be that a site doesn't have to wait 6 weeks to get listed...

      is that good or bad?

    2. Re:Will Grub take off or be smashed? by Threni · · Score: 1, Insightful

      "Also the grub engine crawls everything, including adult content and other questionable content."

      Adult content isn't questionable. You either look at it, or you don't. Don't tell me that stuff about children being harmed by looking of photographs of the naked body has got to you?

      Also, the legal problems exist mainly in your head. No user will be prosecuted for supplying an URL of a website to a third party who then makes it available to people using their search engine, as it simply isn't illegal.

      Unlike SETI, this thing isn't a complete and utter waste of time, although I agree with you about the folding thing.

      "So for the time being I have stopped using the grub client until some serious questions are answered."

      No serious questions have been posed at this time.

    3. Re:Will Grub take off or be smashed? by bcrowell · · Score: 3, Insightful

      This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine.
      Actually, if I had a gun to my head, I'd choose to run Grub, because the client is open-source. I used to run SETI@home, but then the news came out that they'd been sitting on a potential root vulnerability for a long time. That really brought home to me the risks of running someone else's closed-source app on my box.

    4. Re:Will Grub take off or be smashed? by kaden · · Score: 5, Insightful

      Um, I think you're missing the point. This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server. I would quite possibly be arrested and charged, and while I wouldn't be convicted, it's quite an ordeal, and there's an ugly social stigma to even being charged with Kiddy Porn or conspiring with a terrorist. So that's a serious question that's posted by running Grub.

    5. Re:Will Grub take off or be smashed? by Feztaa · · Score: 1

      the news came out that they'd been sitting on a potential root vulnerability for a long time

      Do you have any references? Please back up your claims.

      I like the anecdote, "Gee, this closed source thing turned out to be a huge risk! I'll stay open source, thanks.", but I'd like some proof :)

    6. Re:Will Grub take off or be smashed? by bcrowell · · Score: 4, Informative
      Do you have any references? Please back up your claims.
      here, and here

      Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.

    7. Re:Will Grub take off or be smashed? by dtfinch · · Score: 5, Interesting

      There are many ways to look at this. The idea is to install the client, set Opera to use the same useragent string, visit some of those sites, then blame it on Grub if the FBI comes busting through your door.

      If you're a criminal, installing the Grub client might be a great idea.

    8. Re:Will Grub take off or be smashed? by Moonwick · · Score: 2, Insightful

      Yeah, god forbid you help a commercial organization, especially when the results could stand to benefit you.

      God knows that Google, by virtue of being a commercial entity, has absolutely nothing to offer you.

      Anti-capitalist fucktard.

      --
      Only on slashdot can a posting be rated "Score -1, Insightful".
    9. Re:Will Grub take off or be smashed? by Logopop · · Score: 1

      Some good, valid concerns there. My first concern was of a more practical nature - will the servers take the load when there is a Slashdot jump in the number of clients? My newly downloded client is already spending a lot of time trying to deliver results.
      I like the concept nevertheless. My perpective on things has become quite 'googlified' lately, I must admit. So I will be using the web-based search client for an alternative view on my searches. However, I am still unsure how much I will be using the client. There's nothing wrong in contributing to a commercial venture, as long as I am (in this case) allowed to use the service for free. But, as already mentioned, there may be legal questions that need addressing.

    10. Re:Will Grub take off or be smashed? by Jugalator · · Score: 2, Interesting

      There is not even any potential reward such as with distributed.net.

      How about improving existing search engines with more accurate databases? Commercial organizations like Google might be involved and that's another matter. There might still be a reward to the public.

      --
      Beware: In C++, your friends can see your privates!
    11. Re:Will Grub take off or be smashed? by Anonymous Coward · · Score: 0
      A static user-agent. Very funny.

      Grub appends a unique ID to the user-agent based on the data they gave your client to fetch. You can't use the same one twice.

    12. Re:Will Grub take off or be smashed? by wirde · · Score: 1
      Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.

      Technically you are right. But:

      1. On many Windows installations, it's more or less equivalent.

      2. Under *nix, running arbitrary code as a user is a good first step to excalating to root.

      --
      in GNUin GNUin GNUin GNUin GNUin GNUin GNUin GNUSegmentation fault
    13. Re:Will Grub take off or be smashed? by Kragg · · Score: 0, Troll

      the client is open-source

      Really? My god, how exploitable is that. Give me a week and any searches for sex, news, books, shopping, flatulence, football or art will all lead directly to my ad-filled spamsite.
      Where's my IDE..?

      --
      If you can't see this, click here to enable sigs.
    14. Re:Will Grub take off or be smashed? by Negatyfus · · Score: 1

      On the other hand, if you're innocent and Grub accessed some of that illegal content, try to convince the jury that you didn't abuse Grub to cover up some of your illegal activities like this or that terrorist turned out to have done.

    15. Re:Will Grub take off or be smashed? by stinky+wizzleteats · · Score: 1

      If you're a criminal, installing the Grub client might be a great idea.


      This is exactly the kind of "barrel full of wine, spoonfull of sewage" argument that is going to get the Internet itself banned before too long.


      With things like Freenet running around, and now this (what will happen if these guys get together), the argument will be "Information terrorists have made it impossible to control the Internet. It must, for the sake of the children, therefore be banned."


      Tinfoil hat karma whoring? You be the judge. I do a lot of expert witness testimony and general defense consultation on criminal cases involving information technology. I have sat across the table from types who would make Agent Smith look like Barney Fife. I promise you, when this stuff gets on their radar, it will be in the next Patriot Act.

    16. Re:Will Grub take off or be smashed? by Anonymous Coward · · Score: 0

      But really, grub client doesn't fetch anything but HTML. No images. If there's any log entries for images, like Opera will fetch, they'll know it's faked.

    17. Re:Will Grub take off or be smashed? by joshdaymont · · Score: 1

      You raise some good questions, but there are even more. What about the security concerns? Commercial firms are famous for writing bad code. Also, there are clear privacy dangers here. I for one would never run this on my desktop Josh Daymont MobileSecure, Inc. http://www.mobile-secure.com/

    18. Re:Will Grub take off or be smashed? by Beliskner · · Score: 1
      This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server.
      What the fuck happened to the First Amendment? Rights that you aren't willing to die for will disappear. If you exect the Feds to censor the Internet and track your URLs then they will.

      It's ironic that we attack Al-Qaeda's tactics when our constitution itself demands that we be willing to die for our Rights under the Constitution, unless the Constitution disappeared overnight and I missed a memo.

      --
      A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
    19. Re:Will Grub take off or be smashed? by smagruder · · Score: 1

      Just download version 3.08 to fix it.

      --
      Steve Magruder, Metro Foodist
    20. Re:Will Grub take off or be smashed? by Anonymous Coward · · Score: 0

      it surely is a step into the right direction, as it is questionable if it's good when every websearch is made through google, but it's only a step after all.
      I don't understand why there is a central server is needed when searching is already distributed. I think many people won't like this as it's nothing really new, it just moves the workload to the user

    21. Re:Will Grub take off or be smashed? by Kevin+Stevens · · Score: 1

      I hear this argument alot, but it makes me think- Do you really plan on analyzing all the code on your computer that is open source? Do you even rely on the fact that someone else will? If someone opened the source to Windows tommorrow, could you really count on people to scrutinize all 10 million lines of code? Even if someone does, you have to just rely on their expertise in finding any potential bugs in it. This auditing process is to me not a whole lot better or more thorough than what goes on inside MS's offices, or SETI's labs.

    22. Re:Will Grub take off or be smashed? by Anonymous Coward · · Score: 0

      With things like Freenet running around, and now this (what will happen if these guys get together), the argument will be "Information terrorists have made it impossible to control the Internet. It must, for the sake of the children, therefore be banned."

      It doesn't matter. The internet won't be going anywhere - too much money is being made from it. People have been putting that argument across for years - it was child pornography before the 7/11 thing. Most people aren't that stupid, and the laws would only apply in American anyway, and who cares about that?

    23. Re:Will Grub take off or be smashed? by iamhassi · · Score: 1
      "...then blame it on Grub if the FBI comes busting through your door."

      Great idea! So after you're arrested for kiddy porn and your picture is on the front page of the local paper and the local nightly news, your friends and family disown you, and after a year in jail you finally go to trial to be found not guilty because your $10,000 lawyer argues it was really Grub and the FBI *finally* releases.

      Swell plan you have there.

      --
      my karma will be here long after I'm gone
    24. Re:Will Grub take off or be smashed? by PhilHibbs · · Score: 1

      Yes, because 3.08 fixed the last bug.

    25. Re:Will Grub take off or be smashed? by hermes4293 · · Score: 1

      there are illegal websites?
      tell me one!

  2. Great idea, but will it pan out? by dtolton · · Score: 5, Insightful

    LookSmart hopes to tap the altruistic nature of many Internet users.

    That unfortunately seems like a naively optimistic hope. While the
    vast majority of people may be altruistic, it only takes a few
    unscrupulous individuals to completely undermine a fair result.

    It's interesting that this idea is an extension to Google's model in
    many ways. Essentially Google is able to index so much of the
    interent by having 50,000+ servers. I don't think that's what makes
    Google such a useful search tool, rather I think it's accuracy and
    relevancy. If my search results started getting poluted with bogus
    hits, I would stop using it almost immediately.

    Unfortunately, by letting people run the client on their machine and
    having it send the results back to the server, I think spoofed
    results are inevitable. I don't think it will be possible to
    safeguard the results either, it will be interesting to see how well
    this project survives *when* people start spoofing results. It's
    been a problem for SETI@home, and it's something that undermined some
    peoples faith in the project as a whole. If the spoofed results are
    more widespread and have a larger impact as they would in a system
    like this, it may ultimately prove fatal to the project.

    One factor that has been asbolutely critical to Google's success has
    been their ability to remain resistant to spoofing attempts. It's
    still a question mark how well grub will perform in that context.

    --

    Doug Tolton

    "The destruction of a value which is, will not bring value to that which isn't." -John Galt
    1. Re:Great idea, but will it pan out? by Anonymous Coward · · Score: 0

      Google is subject to abuse easily. Look up "tastylog" A bunch of people on the shack created those links, Google ate it right up.

    2. Re:Great idea, but will it pan out? by Nickilo · · Score: 5, Interesting

      "The General's Dilemma" would solve this problem. The story goes something like this: The general needs to get urgent information to one of his officers, however, he suspects saboteurs are present among his messengers. In order to insure the information gets through accurately, he sends the same message with several men. The officer on the other end collects all the messages and goes with the majority. (And, presumably, kills the others.)

    3. Re:Great idea, but will it pan out? by npongratz · · Score: 1

      Possibly not. The officer would probably have trouble unless the messengers come to him with a verifiably accurate timestamp of the message they're delivering (ie, the Grub server instructs n clients to fetch a page at the exact same time and return the results with the timestamp).

      Why? Well, given the dynamic nature of the Internet, pages change through the course of time (the General updates his messages often). So even a difference of one second can change the results of the fetching of a given page. Thus, we get the illusion of saboteurs in our camp (along with the nasty requisite beheadings) even though the messengers probably are legitimate (ie, no conveniently "touched up" results are returned, yet the returned pages have changed from one unit of time to the next).

      Of course, adding a timestamp alone wouldn't solve the problem, either. There'd be issues with time syncronization due to network latency, timestamp spoofing, etc. I would guess a well-thought-out public key infrastructure would have to be implemented (for secure retransmission of the timestamp), which opens another can of worms.

    4. Re:Great idea, but will it pan out? by aminorex · · Score: 1

      I agree that it's not an unsolvable problem, however,
      it's a bit more complex than you paint it: Dynamic
      content can provide different results on every access.
      What does the officer do if every messenger gives
      a different result?

      --
      -I like my women like I like my tea: green-
    5. Re:Great idea, but will it pan out? by Anonymous Coward · · Score: 0

      Not an anonymous coward, but the system is refusing to log me in again... I was in for a minute there, but now? Anyway, want to add two obvious comments I haven't seen yet.

      One is that competition is a good thing, and Google has already started abusing their success. The other aspect is that helping Grub is a kind of charity, because all of us benefit from better search engines. Returning to Google again, they are making profits ONLY because so many people are using them--and they are managing to resell our eyeballs to advertisers. I actually like Grub's economic model better.

  3. Biiig questions to answer by andy@petdance.com · · Score: 5, Interesting
    So Grub goes out, uses bandwidth, and then returns some results to the home base. It's really distributed bandwidth more than distributed computation.

    I bet one of the big successes in Folding and distributed.net is that many people run the clients on work boxes, knowing that there's little actual overhead incurred to their work. How different that is for a URL sucker.

    I wonder what broadband ISPs think of Grub.

    1. Re:Biiig questions to answer by fatalist23 · · Score: 1

      Well, as a college student on a line with a bandwidth quota (per week capped, not too bad) I can say that I'm not too enthusiastic about donating my bandwidth. The application itself probably wouldn't be too traffic intensive, but given my bandwidth usage habits, I know I run quite close to the caps (which could cause me to get kicked off the network) quite often. Just my .02

    2. Re:Biiig questions to answer by friedegg · · Score: 4, Interesting

      I wonder what broadband ISPs think of Grub.

      If it becomes a problem, I imagine ISPs will declare it a commercial bandwidth usage, and order users to stop or move to a business class plan for more money.

      --
      Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
    3. Re:Biiig questions to answer by Zork+the+Almighty · · Score: 1

      As a college student myself, I'm more concerned about redirecting bandwidth AWAY from destroying the RIAA.

      --

      In Soviet America the banks rob you!
    4. Re:Biiig questions to answer by einer · · Score: 1

      Which in my mind is just another reason that someone should take this idea, and implement an open source version.

      How hard could it be? :)

  4. Haiku :-) by Ignorant+Aardvark · · Score: 4, Funny

    Grub searches the web
    Sniffing out all the good porn
    Not just bootloader

    I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!

    Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called "E-Coli" yet? No? I can just imagine my mom ...

    "Agh! You have E-Coli on your computer!"

    1. Re:Haiku :-) by Anonymous Coward · · Score: 3, Funny

      How about 'SARS'? Four letters, indicates something that spreads quickly...

    2. Re:Haiku :-) by Anonymous Coward · · Score: 4, Funny
      Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
      I'm wondering if the Grub bootloader developers will throw a tantrum and flood the Grub crawler developers' e-mail addresses, claiming that this will confuse people and harm the bootloader project.

      Hee hee.
    3. Re:Haiku :-) by Unoriginal+Nick · · Score: 5, Funny
      Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...

      How about Firebird? I'm sure that won't cause any problems :-)

    4. Re:Haiku :-) by Anonymous Coward · · Score: 1, Funny

      Hmmm yes. maybe they should have used A SEARCH ENGINE before deciding on Grub. Currently the GNU GRUB is the first result on google.

    5. Re:Haiku :-) by Chester+K · · Score: 4, Funny

      As time approaches infinity, the number of software projects named Firebird also approaches infinity.

      It's ok though because they'll all still be different projects, so nobody will get confused.

      --

      NO CARRIER
    6. Re:Haiku :-) by Anonymous Coward · · Score: 0

      Debian package description says:

      Please don't confuse this package with the bootloader with the same name. It has nothing to do with it besides the name. The project is currently
      searching for a better name.

    7. Re:Haiku :-) by certron · · Score: 1

      "Seriously though, shouldn't they change the name? GRUB is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called E-Coli yet?"

      I think, if anything, they should call it Grubi or Grubbi. On one hand, it could be cute, and could probably have a good mascot and backronym for it, and on the other, it indexes anything it can get its grubby little hands/ fingers/ tentacles/ protrubances on. Sounds like a good name to me. :-)

      I'm sure some bio person will tell you all about e-coli and how usually it isn't harmful. Or something. I'll let them tell about it, even if it is unrelated to a name.

      --

      fair.org counterpunch.com truthout.com indymedia.org salon.com
      eff.org guerrilla.net debian.org gentoo.org
    8. Re:Haiku :-) by Anonymous Coward · · Score: 1, Insightful
      I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!


      OK. 15 minutes are up, and we are STILL waiting for your "Good" joke.

    9. Re:Haiku :-) by iomud · · Score: 1

      Now that was funny.

    10. Re:Haiku :-) by rowanxmas · · Score: 1

      I am going to stand by my choice of LILO for the new name of this software.

    11. Re:Haiku :-) by Anonymous Coward · · Score: 0
      "Taco" also meets that definition.

      ~~~

  5. Business Plan? by Anonymous Coward · · Score: 2, Insightful

    What are sensible business plans for this type of endeavour?

    Should we expect to see many commercial efforts focussed on providing similar "crawl" or "index" capabilities, but each honed to a specific niche market? A scientific crawler? A retail links database?

    One could argue that similar efforts targeting music resources have resorted to less automated techniques, i.e. human-driven sharing.

    Thoughts?

    1. Re:Business Plan? by ddimas · · Score: 1

      First explain to me why I should donate my resources to your profit?

      I think that they're just trying to avoid paying for hardware. No thanks, they can make money without my stuff.

  6. Hrmm, I wonder how long... by bergeron76 · · Score: 3, Insightful

    until someone figures out a way to compromize their local client's results and "escalate" their fave URLS.

    It still sounds like a really cool idea though.

    --
    Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
    1. Re:Hrmm, I wonder how long... by CaptainMunchies · · Score: 3, Insightful

      Grub's clients don'tcome up with a ranking for each website they crawl; rather, they check to see if this website has changed since the last time it was crawled. For any website that has changed, the client notifies the server. The search engine asks the server which sites in its index need to be updated, and the server gleefully replies.

      Clients artificially increasing their ranking isn't an issue, since the client has nothing to do with a site's ranking.

      --
      Spam removed for the Internet's pleasure ...
  7. grub is already taken by stock · · Score: 2, Insightful
    Grub is the GRand Unified Bootloader, a GNU project, so the name is already taken.

    Hmm searchengine eh? Why don't you call it grab ?

    Robert

    1. Re:grub is already taken by Concerned+Onlooker · · Score: 1

      So is Grab. It's a screen capture app that comes with OS X. Maybe their lawyers wouldn't mind sharing....

      --
      http://www.rootstrikers.org/
    2. Re:grub is already taken by mackstann · · Score: 1
      What's the deal with names lately? Who cares!

      I don't see Phoenix being used for BIOS and a browser as a problem, I don't see Firebird being used for a database and a browser as a problem, and I don't see grub the bootloader and grub the web spider conflicting. They're entirely different products, and there are only so many words out there. Here is one of a million examples of a name that is taken by tons of different companies.

    3. Re:grub is already taken by knowledgepeacewi · · Score: 1

      I don't see Phoenix being used for BIOS and a browser as a problem
      Yeah, but the legal system might. No one is as anal as a lawyer is about words and wording. And since Judges are lawyers...

    4. Re:grub is already taken by stesch · · Score: 1
      Grub is the GRand Unified Bootloader, a GNU project, so the name is already taken.

      Does anybody see the humor in this? They haven't used a search engine to check the name ...

  8. If previous results are any guide by carl67lp · · Score: 5, Funny

    1. Tech-savvy people will install this.
    2. Tech-savvy people tend to be loners.
    3. Loners most often search for porn.

    C1. Tech-savvy people search for porn.

    4. Items searched for most often reach the top of the list.
    5. Porn is searched for often by tech-savvy people.

    C2. Porn will be easier to find with this new search engine.

    Count me in!

    1. Re:If previous results are any guide by KoolDude · · Score: 1


      1. Tech-savvy people will install this.
      2. Tech-savvy people tend to be loners.
      3. Loners most often search for porn.

      C1. Tech-savvy people search for porn.

      4. Items searched for most often reach the top of the list.
      5. Porn is searched for often by tech-savvy people.

      C2. Porn will be easier to find with this new search engine.


      6. pr0nit !?!

      --
      getSexySig(); /* returns sexy signature */
    2. Re:If previous results are any guide by anon*127.0.0.1 · · Score: 4, Funny

      You're having trouble finding porn now?

      --
      I am NOT a man!
      I am a free number!
    3. Re:If previous results are any guide by Anonymous Coward · · Score: 0

      Who needs some big fancy distributed search engine just for pr0n? Use autopr0n and get it for free.

    4. Re:If previous results are any guide by Saeger · · Score: 1
      People still search for porn on the IntarWeb instead of p2p? Amazing.

      --

      --
      Power to the Peaceful
  9. great news! API? by The-Perl-CD-Bookshel · · Score: 2, Interesting

    This is going to challenge Google's search, which will entice them to cut loose some of those really cool google labs concepts. Froogle, Google News, and all of the other cool things that they are working on are great services and are going to be the focus of innovation over at Google.

    Also, Looksmart needs to develop and release an API for this system. You can only use the google api for 2,000 searches per. day. If they allowed unlimited usage, it would get a lot of developer backing.

    --
    I don't keep a lid on my coffee so when I walk around I look busy -me
  10. Not news for us webmasters by Gothmolly · · Score: 1, Insightful

    grub has been crawling my site for weeks if not months now. How is this news? Because someone at Wired wrote about it? Geesh.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Not news for us webmasters by Redwing · · Score: 5, Interesting

      Here is what slashdotters were saying about grub almost 2 years ago.

      --
      Raisinettes are my raison d'etre
    2. Re:Not news for us webmasters by commodoresloat · · Score: 1
      How is this news? Because someone at Wired wrote about it?

      No; because someone at Wired News wrote about it.

    3. Re:Not news for us webmasters by hswerdfe · · Score: 2, Insightful

      dude, get over yourself....

      I never heard tell of Grub.org before.

      I found it interesting....

      not every link on slashdot is going to directly relate to you....

      --
      --meh--
  11. Grub by squiggleslash · · Score: 3, Funny
    Ok, so how are they going to store this giant search engine in the boot sector of an ordinary hard drive?

    Oh wait, you mean it's not related to GRUB, the Linux/etc boot loader. *slaps forehead* But I guess this solves everything - we can call Phoenix "Grub" too, and just treat it as the generic name to call everything we're having problems thinking up a name for...

    --
    You are not alone. This is not normal. None of this is normal.
  12. Firewalls? by adam_megacz · · Score: 5, Insightful

    So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

    1. Re:Firewalls? by friedegg · · Score: 3, Informative

      You can always put an entry in your robots.txt to block it.

      Actually, the robots.txt issue is one they're still working on. Right now it doesn't check the file very often, which upsets some webmasters.

      They're open to suggestions, so maybe you could suggest a list of blacklisted IP's/hostnames. I suggested they look into supporting gzip compressed web pages, and they said they'd look into it.

      --
      Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
    2. Re:Firewalls? by GigsVT · · Score: 2, Interesting

      If you knowingly run a program that openly spies on every page you go to, you get what you deserve.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    3. Re:Firewalls? by adam_megacz · · Score: 1

      I don't run the webserver in question.

      Also, what if the inept secretary down the hall (who has no idea what robots.txt is) decides to run this thing?

    4. Re:Firewalls? by friedegg · · Score: 2, Informative

      Well, if you're getting into "What if"'s, she could could also email someone outside the company anything from inside the firewall. Or setup a file sharing client like Kazaa and share things on local and network drives.

      If you wanted to forbid the client from working, network admins could block port 3136 (I think it is), which would prohibit communication with the central server.

      My understanding is that grub does not just crawl away randomly, rather it's given a list of things to crawl by the central server. So, assuming it hasn't crawled your intranet before, and you don't give it a local site to crawl, it shouldn't normally find them. But, like I said, they're open to suggestions, so if you have some, offer them.

      --
      Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
    5. Re:Firewalls? by CableModemSniper · · Score: 1

      well since you don't know that robots.txt is on the webserver anyway, I'm sure it won't be a problem that the secretary doesn't know this ;)

      --
      Why not fork?
    6. Re:Firewalls? by Anonymous Coward · · Score: 0

      because that would be stupid?

    7. Re:Firewalls? by YoungHack · · Score: 1
      So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

      You don't, and the spider regularly indexes things on 127.0.0.1. There are an awful lot of domains out there that resolve to that. That's why I don't run the spider.

    8. Re:Firewalls? by Anonymous Coward · · Score: 0

      How many people would run the spider from an important webserver?

    9. Re:Firewalls? by apsyrtes · · Score: 1

      how does "Knowingly?" enter into it?

      You know... we have a lot of sensitive stuff on our company intranet. And there are *way* more staff than our network/computer systems admins can ever expect to handle.

      And some of them read slashdot.

      (of course, I mean the *users* not the *admins*) 8(

      I can't wait to see my salary floating around on some Looksmart results page.

    10. Re:Firewalls? by GigsVT · · Score: 1

      Knowingly as opposed to spyware that tries to trick you into installing something that spies on you.

      This thing's stated purpose is spying on what pages you go to.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    11. Re:Firewalls? by sakshale · · Score: 1
      They're open to suggestions, so maybe you could suggest a list of blacklisted IP's
      One restriction - no private IP numbers - URL listings for a host at 10.100.200.1 would not be very useful.
      --
      For every problem there is a solution that is simple, obvious and wrong.
  13. Google Toolbar by petree · · Score: 5, Interesting

    Couldn't google do this anyways with the google toolbar? Cause with the advanced features version it tracks every page you visit. If they offered some incentive to install the toolbar, google could just beat them at this game. I actually use the google toolbar already by choice (it makes my web searching more productive) everyday, all they have to do is get lots of people using it and wouldn't that work just as well or better?

    1. Re:Google Toolbar by Anonymous Coward · · Score: 1, Interesting

      Google Toolbar does have a distributed computing option now (you have to turn it on). I think they're using it for SETI or folding or one of those worthwhile causes. I always assumed the incentive to use the toolbar was the functionality it provides.

    2. Re:Google Toolbar by Kelerain · · Score: 5, Interesting

      This tracking is actually how a lot of important information leaks out. Security through obscurity has always been a poor mans system, and this busts it wide open. I wont post them here but there are several interesting searches you can do that give personal results for things that REALLY have NO place on a publicly accessable page. On a more positive note, google already uses distributed computing though thier googlebar http://toolbar.google.com/dc/offerdc.html However they donate the cycles to various worthy causes like folding at home (currently thier only benificiary), but it is concevable that if they came up with some secure and usefull search related thing to do with the cycles they could put it to use almost instantaniously. I think that there aren't segnificant benifits (plenty of discussion elsewhere here) for them to want to use it however.

    3. Re:Google Toolbar by Phroggy · · Score: 1

      If they offered some incentive to install the toolbar, google could just beat them at this game.

      Does being a kick-ass tool (for those unfortunate enough to be using Internet Explorer) count as incentive?

      --
      $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
      $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    4. Re:Google Toolbar by Gryftir · · Score: 1

      Grub appears to have more cross-browser and cross platform (Google Toolbar only runs on Internet Explorer 5 for now.) Grub runs on Linux and windows, and since it isn't a browser plugin, doesn't require you to have a certain browser.

      --
      http://www.santacruzbynight.com/index.shtml Santa Cruz By Night Vampire Larp
    5. Re:Google Toolbar by Anonymous Coward · · Score: 0

      c'mon! give us the "interesting searches" you're talking about! they deserve it!

    6. Re:Google Toolbar by James_Duncan8181 · · Score: 1

      Please, please, please detail. You can't just dangle something so interesting...

      --
      "To any truly impartial person, it would be obvious that I am right."
    7. Re:Google Toolbar by Kelerain · · Score: 1

      Please, please, please detail. You can't just dangle something so interesting...
      Well manily becuse I couldn't think of a good example at the time. I actually saw these before on a slashdot thread which I am unable to locate. But searching for things like "pub pwl" or "directory of" passwd. Things that are obviously insecure. While no one knows about them they are sometimes left open. And the user with his google toolbar will sometimes go there. And then it gets on google. OOPS. There are some better searches out there. Get creative. The point is, if you know what you are doing google is great for finding unsecure systems and private information.

  14. They could always crawl twice by Anonymous Coward · · Score: 0

    Assuming they had enough people, they could always crawl twice to see if the submitted stuff matches.

  15. Re:Business Plan? - Google by Anonymous Coward · · Score: 0

    Well, Google's been targeting straight-up, no-frills search for a while now, and manages to sell this very successfully to its advertisers.

    Of course, once a context-specific search engine wins the majority share of its targeted market (as Google has done for the entire general market), then it can branch out and offer enhance "pay" services, or usage statistics.

    Some markets are more cash-strong than others, for example the construction industry, the entertainment industry, or banking. The dot-com boom saw the failure of many efforts to bring internet-and-technology to the construction industry. In the banking industry many such efforts succeeded. In the entertainment industry, well, you tell me?

  16. Hardly distributed crawling by Herbst · · Score: 2, Interesting

    ...rather a crawl with a distributed component.

    They use the screensaver grub clients to check if a web page has been modified since the last time it was crawled (by the centralized crawl done by Looksmart). They probably use some smart MD5 checksum of the pages and send that with the urls to be crawled to the clients. If the checksum of what the grub client crawled doesn't match then the centralized crawl is instructed to re-fetch that url.

    They go this route because the If-Modified-Since HTTP 1.1 request is not supported by many webservers (and even if it is, you can't really trust it). This is especially true for dynamically generated web pages. I.e., if If-Modified-Since would work reliably then it would be a simple operation to check if a previously crawled page has changed. Since that's not the case, they are outsourcing the expensive refetching of whole pages.

    It will be interesting to see how this pans out. I think they could run into trouble with ISPs if this really takes off (because bandwidth consumption per user would increase and make flatrate deals less profitable for some ISPs).

    1. Re:Hardly distributed crawling by myov · · Score: 2, Insightful

      Not the greatest way of doing this. On one of the sites I maintain, the date shows up at the top of the page. The other content changes very infrequently in most cases (a few pages hit a news&events database but that's about it). But the new date would be enough to change the checksum (unless they're allowing for it somehow)

      Grub hits us quite often. I've seen the same URL hit multiple times in one day by different hosts. It's ignoring the "revisit-after" meta tag (7 days), but then, so are most of the other search engines. While I haven't banned it, I am watching the amount of bandwidth it uses.

      --
      I use Macs to up my productivity, so up yours Microsoft!
    2. Re:Hardly distributed crawling by Herbst · · Score: 1
      Not the greatest way of doing this. On one of the sites I maintain, the date shows up at the top of the page. The other content changes very infrequently in most cases (a few pages hit a news&events database but that's about it). But the new date would be enough to change the checksum (unless they're allowing for it somehow)

      That's why I mentioned "smart" MD5 Checksums. You'd only checksum certain parts of a page. E.g., detecting everything that looks like a date and make sure that that's not part of the smart checksum. As long as the checksum parser on the grub client and the one at Looksmart are identical, that should work pretty well.

  17. From crawling to leeching by Anonymous Coward · · Score: 0

    If my search engine client ever became ubiquitous enough, I wonder how good a search index you could build, not by actively crawling, but passively harvesting all the efforts of your huge collective of clients. Sounds way too scary to want on my machine.

  18. The Distributed Search Engine by deadfishhotmail.com · · Score: 2, Interesting

    It's kind of funny and a bit ironic that search engines are generally used to search information from a central repository and Grub uses a distributed network to index pages. It's almost like having a distributed google cache (that's updated more frequently). Perhaps a better idea would be to invent a crawling daemon that runs on each server with a standard protocol that reports to a central server the relevence of search terms (hey it's DNS for search terms!!) - to bad it would be heavily abused (mostly by Buy Now, Free Money and Pron avenues I suppose).

    Ok now tell me that it's already been done, 'cause I'm pretty sure it has (and probably by Microsoft for ad money).

    Well it's an idea that might be more efficient and updatable than Grub anyway.

    --


    Who is this "Poster" guy and why does he own all of my comments?!?
  19. Google's technology is superior... by eidechse · · Score: 4, Funny

    ...those pigeons can't be beat.

    1. Re:Google's technology is superior... by Dannon · · Score: 1

      Indeed. In every contest between pidgeons and grubs to date, the pidgeons have clearly had the upper beak.

      --
      Good judgment comes from experience.
      Experience comes from bad judgment.
    2. Re:Google's technology is superior... by trats · · Score: 0

      I'm wondering if your statement has been affected by peer pressure or the media. Has anyone else noticed that Google's results have been declining over the past few months/years, or is it just me?

      Three years ago when I first discovered Google, it actually had an amazing ability to turn up the page that you were looking for as the first result. These days, I'm lucky if I find what I'm looking for at all.

      Has the web grown that much that the effectiveness of PageRank has decreased noticeably? Or it could be just me. :-/

    3. Re:Google's technology is superior... by eidechse · · Score: 1

      You should probably take a look at that backstory link above.

    4. Re:Google's technology is superior... by Boss,+Pointy+Haired · · Score: 1

      Or does anybody else not find this pigeon rank thing that funny?

      I think it's pretty lame myself.

      But whenever someone mentions or links to pigeon rank around here it gets +4/5 funny every time.

  20. My Take on Grub by Anonymous Coward · · Score: 2, Informative

    Looksmart is only using Grub to save on their bandwidth. Essentially Grub just compresses web pages before sending them to Looksmart's indexer thus reducing the bandwidth they have to pay for by a factor of 5 or so. The same thing could be accomplished through a proxy which compresses web pages. Eventually, once the HTTP mime standard for requesting compressed web pages is better supported by web servers, Grub will not be necessary.

    1. Re:My Take on Grub by Anonymous Coward · · Score: 0

      actually it could save bandwidth if you think about it. if everyone crawled their own content with their own internal LAN, then it wouldn't use any bandwidth except for when things changed.

      i think they are on to something.

  21. Has anyone consulted the master of bigger engines? by Anonymous Coward · · Score: 0, Funny

    I think Tim the tool-man Taylor is the man for this job. Nobody over builds engines, better than this lovable Tim Allen character.

    More POWER! ugh ooough ooough!

  22. What about the RIAA? by One+Louder · · Score: 3, Insightful
    So...let's say my instance of Grub crawls over a repository of .mp3s and supplies that information to the combined index.

    What's the difference between my machine indexing them and the university students recently being hauled into court for indexing open shares? Why would I not be held liable for contributory copyright infringement?

    No thanks.

    1. Re:What about the RIAA? by Anonymous Coward · · Score: 1, Insightful

      Because this would call into question the future of all search engines, and you'd see the big plays like Google, Yahoo, Overture, etc head into court with their own high priced lawyers. You think the RIAA wants a fight it doesn't think it can win?

    2. Re:What about the RIAA? by SmartGamer · · Score: 2, Interesting

      Here's the catch: it's going for scare tactics.

      The Church of Scientology has already threatened Google and gotten results moved; I can, in all honesty, see the RIAA going for it.

      It would be an earthshattering case, but here's the thing: the RIAA stands a disturbingly good chance of winning.

      I hope, I pray they don't were they to try it- and try they most certainly will, because they think they can get money out of the lawsuit and they want money. That's very likely a major motive.

      Oh, and to mods-for-a-day: mod the parent of this post up. It's thoroughly underrated at zero.

      --
      Warning: Poster of this comment is a nerd. Just like everybody else here.
    3. Re:What about the RIAA? by SmartGamer · · Score: 1

      Difference: You can show that you don't have direct control over it, and it is likely that they'd go for Grub instead of the users. ...other than that, not much. Note that I think the RIAA is full of excrement on their recent case as well.

      --
      Warning: Poster of this comment is a nerd. Just like everybody else here.
    4. Re:What about the RIAA? by knowledgepeacewi · · Score: 1

      the RIAA stands a disturbingly good chance of winning.
      Even if the server containing the MP3s is in a country that doesn't recognize copyrights?

      I would think displaying a link to copyrighted material would fall under free speech as long as you don't supply the material itself. But IANAL and the RIAA has a lot of money to blow.

    5. Re:What about the RIAA? by Anonymous Coward · · Score: 0

      A friend of mine had his MP3 collection accessible from his internal webserver. Which one day became his external webserver as well.

      All of a sudden he was getting dozens of hits per hour for /archive/music/[...].mp3.

      So he redirected anything under /archive to a page that simply said "Site removed due to Google indexing. 128kbit connection flooded. Click here to say you're sorry", and updated a
      counter on his front page of all those who had apologised. (A reasonable number of people did).

      Then, he went one better, and made anything under /archive redirect to the game at druglords.com, one of those "trick people into clicking" games. Which told the user they'd just been sold drugs and updated his score. Needless to say, he was soon in the top 5.

      So, having your MP3 collection indexed by Google isn't an altogether bad thing. :)

    6. Re:What about the RIAA? by Saeger · · Score: 1
      So your friend was dumb enough to not use robots.txt and to leave insecure Directory Indexes enabled, but smart(ass) enough to redirect his newfound visitors to funny pages? cute.

      --

      --
      Power to the Peaceful
    7. Re:What about the RIAA? by Maxamoto · · Score: 0

      No worries, mate. in less than 2 years the RIAA will be gone, and so will most of the entertainment industry (hollywood, anyway). Artists will have to go back to working for themselves to make money, and the music will, of course, be free.

      --
      "Your CPU came with a keyboard? What kind of ghetto deal is that?" -McSuede
  23. They realize they aren't the REAL GRUB by anagama · · Score: 5, Informative

    From the readme in the linux version - no idea what the other readmes might say. However, it appears that they are sensitve to the fact that bootloader grub pre-existed their program. They are requesting catchy names. Here is an excerpt:

    Notice
    ======
    The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

    --
    What changed under Obama? Nothing Good
    1. Re:They realize they aren't the REAL GRUB by Anonymous Coward · · Score: 0

      In stark contrast to the Mozilla Firebird readme:

      The main executable has been renamed to "Firebird" even though an open source database was already named Firebird. They were out first, so we decided to throw our weight around since we are bigger and badder. If you have a catchy suggestion for a new name, please let them know - we ain't changin shit. Punk ass little database twerps.

      :-D

    2. Re:They realize they aren't the REAL GRUB by RighteousFunby · · Score: 1

      I have some ideas for a new name... parasite leech bloodsucker bigbrother or, even better windowsxp

    3. Re:They realize they aren't the REAL GRUB by Anonymous Coward · · Score: 0
      And Grub is such an appealing name, what a pity that it is already taken.


      Maggot would make a good second choice though.

    4. Re:They realize they aren't the REAL GRUB by JWSmythe · · Score: 1

      I dare say Pontiac had the name first. The 1967 Pontiac Firebird was the first.. :) I'm a big Firebird fan. I've had many F-Bodies from the 1975 Camaro LT-1 to the 2000 Firebird TransAm WS/6.

      Honestly, it's going to be hard to come up with any name that someone, in some way, thinks they already have claims to..

      But, to keep this completely on topic, it seems the grubclient has problems.. It works fine on a Slackware 8.1 workstation, but bombs out with a segfault after a few minutes on a Slackware 8.0 machine..

      Too bad for them. The Slack 8.0 machine is on a 1Gb/s connection. The Slack 8.1 machine is on a suck-ass Charter Cablemodem..

      I got Charter Communications's junkmail in today for bribes on upgrading my bandwidth. For only an extra $80/mo they'll increase my upload to 128k (from 24k), and my download to 512k (from like 128k).. This is a *FAR* cry from what all the cablemodem providers were claiming when they started. if I remember right, they were advertising 3Mb down, 1Mb up... Now I may as well be on a dialup if I'm uploading.

      Cablemodem providers suck ass.. I'm contemplating getting my own T1 loop to my office. :)

      --
      Serious? Seriousness is well above my pay grade.
    5. Re:They realize they aren't the REAL GRUB by Saeger · · Score: 2, Interesting
      Oh please! There's 6+ billion people on the planet now, and not enough unique namespace for everyone or every business to have that one 'cool' short name, so why they don't do what us humans have done? GET A LAST NAME.

      Grub The SearchEngine
      Grub The Bootloader
      FireBird von Browser
      FireBird von Database
      Gentoo el Distro
      Gentoo el FileManager
      Apple Computer
      Apple Records

      I'm serious. Nobody should feel entitled to an exclusive piece of namespace just because they think they had it first or are bigger & badder and more deserving than some newbie treading on their turf. (trademark `this!')

      --

      --
      Power to the Peaceful
    6. Re:They realize they aren't the REAL GRUB by Redglare · · Score: 1

      suggested names: *chump *prey *wget *lookSmartSucker *thornleysFolly

  24. Google crawls a lot, actually by bigberk · · Score: 1

    It seems that google is actually crawling my site a lot more than grub is. Over the past 6 days:

    $ grep -c Googlebot access_log
    827
    $ grep -c grub-client access_log
    153

    1. Re:Google crawls a lot, actually by oaf357 · · Score: 1

      That's not a very good representation. Google has been going through its deep crawl the past 6 days.

    2. Re:Google crawls a lot, actually by bigberk · · Score: 1

      Google has been going through its deep crawl the past 6 days.

      Oh, ok... the numbers I was seeing did seem weird :)

  25. A better use for my screensaver time by Call+Me+Black+Cloud · · Score: 5, Insightful

    I prefer grid.org to grub.org. There the cycles are going to cancer or smallpox research. Currently over 2 million machines are participating.

    Altruism has its place, but since I'm more likely to die of cancer than of not having the complete www indexed I think I'll be selfish and work towards a cure for something that may affect me.

    1. Re:A better use for my screensaver time by BigZaphod · · Score: 1

      Anything like this for MacOS X? I checked the system requirements on grid.org and it seems to be windows only.

    2. Re:A better use for my screensaver time by pointwood · · Score: 1

      I would suggest Distributed Folding instead. At least they got good clients and clients for more than just Windows ;)

  26. curious. by toothfish · · Score: 2

    i wonder if google has already seen this coming (i've seen that grub fellow in my logs a number of times and sort of wondered about it), and is going to use their own distributed search engine once they get the bugs hammered out...

  27. Oh, just great. by TrebleJunkie · · Score: 1

    *Another* bunch of spiders chewing up my bandwidth, ignoring my robots.txt files, and bringing my server(s) to their knees.

    Joy of freaking joys.

    --

    Ed R.Zahurak

    You know, oblivion keeps looking better every day.

    1. Re:Oh, just great. by iggymanz · · Score: 1

      I've got hits from grub from 57 different addresses in the last month. So there's certainly no coordination among the clients. It's a WASTE of web server bandwidth. I also don't appreciate bots that claim it will come back to the robots.txt file later after crawling through denied pages and wasting even more bandwidth.

  28. Indexor or Search Engine? by digitect · · Score: 4, Interesting

    I expected some way to search... this looks more like a project to index the web rather than make the results available for public use via web interface. Did it strike anyone else odd that there was no web form on the home page with which to search?!

    It seems like a good concept, but the availability of the information collected needs to be accessible without installing the client. I'm not game to install distributed computing apps without some freely available benefit. The "for the good of the world" motivation went out the window for me about a day after my first Seti At Home experience. (But now BitTorrent, there was appreciable benefit. I had RedHat 9 isos within 8 hours of their initial release!)

    --
    There is no need to use a SlashDot sig for SEO...
    1. Re:Indexor or Search Engine? by Anonymous Coward · · Score: 0

      i don't think they have the index running yet. doesn't looksmart own wisenut though? using wisenut's technology, deployed on a grid, you could do some serious web page crunching.

    2. Re:Indexor or Search Engine? by Anonymous Coward · · Score: 0

      it looks like they are feeding the wisenut search engine right now. they are saying in the forums that they will do other searchish things later on when they have more clients running.

      pretty cool

    3. Re:Indexor or Search Engine? by LetterJ · · Score: 1

      Is nobody looking at anything other than the linked page? There's a "Tools" page that has not only a link to a search box that uses the results, but to their XML API for working with the engine.

  29. Re:search.msn.com is the future by shibbydude · · Score: 5, Interesting
    In particular, the company has its own team of editors that monitors the most popular searches being performed and then hand-picks sites that are believed to be the most relevant.

    You have to be kidding or working for Microsoft, or both! Have you ever searched for Linux on MSN? Try it - here.

    Notice the third result? "Learn about the Microsoft alternatives and how to move to them from open source products." I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

    --
    We're only gonna die from our own arrogance, that's why we might as well take our time...
  30. Small Thing by Qacker · · Score: 1
    Hmmm what is my login again?...

    Set Up Your Account Please register for your Grub account. We will NOT release your personal information to anyone, and your email address will not be displayed on the site. Your email address will be your Grub login.

    * Email:

    * Username:

    * New Password:

    --
    Learn lisp today!
  31. blah by jafac · · Score: 1

    just another extension of the 1998 zeitgeist;
    It's all about eyeballs.

    baloney.

    Show me the profits.

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  32. You can run both by friedegg · · Score: 3, Informative

    Grub isn't a heavy cpu users. Right now, on my Athlon (~2400+), it's using between 0-2% of the CPU at any given time. Grub is mainly interested in your excess bandwidth.

    --
    Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
    1. Re:You can run both by rabidcow · · Score: 5, Funny

      Grub is mainly interested in your excess bandwidth.

      Unfortunately, so is my ISP. In fact, they've already sold it to other customers.

    2. Re:You can run both by smagruder · · Score: 1

      Grub is mainly interested in your excess bandwidth.

      And, I would suppose, the excess bandwidth of many web hosting packages. I do not want Grub hitting my hosted sites from all these disparate IP's just to build a new search engine we don't need. To prevent a possible DOS due to running out of purchased bandwidth, I'm going to have to write site code that denies site access to the Grub clients. I can make do with the fact that my sites already have decent listings on Google and dmoz.

      --
      Steve Magruder, Metro Foodist
    3. Re:You can run both by Anonymous Coward · · Score: 0

      It's really not that complicated. Just setup a robots.txt to block it, or if you're really paranoid, use mod_rewrite to block the grub user agent string. Or, if you still want the benefit of a listing, but control, setup Grub, "own" your own sites (so you do exclusive crawling) and crawl your site when you feel like it.

    4. Re:You can run both by smagruder · · Score: 1

      It's not complicated to alter a small bit of my common PHP code that blocks out particular user agents. Besides, it's being reported that Grub doesn't necessarily adhere to robots.txt instructions.

      --
      Steve Magruder, Metro Foodist
  33. Search engine software and lack of A . I . by zymano · · Score: 0, Troll
    Have these search companies ever thought of actually hiring people to sort some of the garbage in their databases?

    of course not.

    These guys buy a bunch of servers and let some DUMB designed software to try and find what your looking for. It's really stupid shit if you ask me.

    Whats really needed is some sort of A.I. . Either real people actually giving you advice on finding something or programs that can think like humans and know what humans want.

    Also needed is to catagorize FREE CONTENT from the commercial websites . The internet which was invented for government sharing information has turned into a FILTHY ,CRASS, Commercial overloaded sack of SHIT.

    Anyone that can implement these features , I tip my hat to you. Google, yahoo, lycos all suck in my opinion. These companies should hire some of those soon to be out of work telemarketers but they wont because they think their software is so special but it's actually rediculously unprofessional,shoddy and cheap.

    If anyone wants evidence then try and find content in these search engines about starting an internet business. I get SPAM site after spam. Nothing legitimate. It doesn't help that these overlyhyped so-called searchengines take under the table cash from businesses for placement on their searches.

  34. Re:Search engine software and lack of A . I . by Anonymous Coward · · Score: 1, Informative

    Google is very responsive to spam reports. Rather than simply remove spam sites tas they find them, they prefer to "teach" their software what's bad from example. This can take a bit of extra time, but it seems worth it to me. Google even has a link on their search results for feedback if you're unhappy. Try reporting bad searches some time.

  35. Re:Search engine software and lack of A . I . by adamruck · · Score: 1
    --
    Selling software wont make you money, selling a service will.
  36. Mad Penguin logo by Anonymous Coward · · Score: 0

    this is totally off topic... but has anyone seen MadPenguin.org's logo? I about fell out of my chair when I saw it. Seems they have been endorsed by Muhammed Saeed al-Sahaf LOL.

    Thought I would share :)

    nox

  37. Phew... by WetCat · · Score: 1

    An enormous amount of spiders that are hunting for an enormous amount of web flies - pages...

  38. actually.. by SystematicPsycho · · Score: 1

    they're going to sneak in file sharing support with a kazaa plugin.

    --
    Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
  39. Looksmart by Ark42 · · Score: 3, Interesting

    Isn't Looksmart/Sprinks a big pay-per-listing deal? The looksmart logo in the upper right corner was enough to make me just close that page right away without any second thought.

  40. From dictionary.com by Anonymous Coward · · Score: 0

    anyways ( P ) Pronunciation Key (n-wz)
    adv. Nonstandard

    In any case.

  41. Re:Search engine software and lack of A . I . by zymano · · Score: 3, Insightful
    I didn't know that.

    But it still kind of irks me that people think that a computerized 'dumb' search result could compete with a human rating system that filters spam,porn,and other garbage results. Google should hire some REAL PEOPLE that can do some sort catagorized intelligent directory so we can have QUALITY at the beginning of a search result. Some sort of HUMUN RATING system is needed to sort. The software is not up to par.

  42. Lame.. by Anonymous Coward · · Score: 1, Insightful

    Grub has had problems forever. I remember when they first announced it. It sounded cool, so I went to check it out. Turns out the actual crawling was done by.. wait for it.. wget. How lame is a web crawler that uses wget?

    Then people started to realize that grub didn't have a good set of AI back at the mothership--lots of pages got crawled way too often, grub didn't obey robots.txt, etc. Many webmasters just started banning grub altogether.

    Now we find out that LookSmart has bought grub and its three developers. LookSmart is the company that stabbed its customers in the back by starting to charge for every click from its directory instead of a one-time fee for inclusion.

    These two groups deserve each other. Grub was supported by the community, but now that they've sold out to commercial interests, who wants to give up their bandwidth for free to LookSmart? The grub code was GPL--I wonder if grub will start to change the license to make the code closed source..

  43. Re:search.msn.com is the future by velkro · · Score: 2, Funny

    Not to mention:

    Results 1-15 of about 609 containing "linux"

    I seem to remember there being more than 609 websites with Linux information on them...

  44. Re:search.msn.com is the future by inertia187 · · Score: 1

    So, pray tell, where does that result belong? I agree, it shouldn't be number three, but where then? It's nowhere to be found in the first ten pages of Google. Am I to assume does not Google weight search results? No, just look at the Search King case. I don't think we can really rely on any search engine with an agenda, but we have no other choice.

    --
    A programmer is a machine for converting coffee into code.
  45. Not only that... by Anonymous Coward · · Score: 0

    Those pigeons can eat the grubs, solving two problems at once.

    Just watch out for the part about killing two birds with one stone.

  46. Flood Control by SmartGamer · · Score: 2, Interesting

    According to the Grub FAQ, it respects robots.txt although not the META tags. Although it takes a week or two for it to listen to the robots.txt, it does eventually...

    The sheer volume of this project concerns me, however. The very fact that it got Slashdotted may cause it to be a bit heavier than expected!

    It sounds like a good use of spare bandwidth, but if it's going to wind up a superscanner, it's going to send a hell of a lot of requests.

    I tried it and deleted it as quickly: it's not very good at being a bottom feeder, it redlined my system resources immediately and slowed everything down. Duration between installation and uninstallation: twenty-nine seconds.

    --
    Warning: Poster of this comment is a nerd. Just like everybody else here.
  47. Web searching will only get harder... by Sancho · · Score: 2, Insightful

    ...as the web gets larger and more cluttered.

    I've already discovered this with comic books turned into movies. Finding synopses of the comic book X-Men is nigh impossible. Finding syopses of the movie s is much, much easier. Damn near every site online about X-Men, Spiderman, The Hulk, Batman, etc. deal with the movies, and sifting through the cruft is not easy. And that's just comic books. Other topics can be just as hard to find, and this doesn't even touch upon fake search results that only turn up porn or worse, a blank page (happens frequently).

    Searching for MORE stuff isn't going to help. Searching better is the key. Google goes a long way towards this, but even it has the same problems of finding too much crud.

    1. Re:Web searching will only get harder... by mattwolfewvu · · Score: 1

      Yes mods, this is offtopic, I'm just kindly replying to the parent post. This (www.marveldirectory.com) is a site I found about a month ago. Nothing too in-depth but fun to poke around in.

      --
      "I think that when you become a Republican, you don't get to score any more." -- Butt-head
    2. Re:Web searching will only get harder... by wheany · · Score: 1

      I found out the same thing when I wanted to know what Bullseye (from Daredevil) looked like in the comics.

    3. Re:Web searching will only get harder... by PhxBlue · · Score: 1

      Actually, Google goes further than you think. You just have to know how to search.

      --
      !#@%*)anks for hanging up the phone, dear.
    4. Re:Web searching will only get harder... by ktorn · · Score: 1
      I totally agree with you. We already have speed and quantity (i.e. google) what we need now is quality.

      Some have a point in saying google provides much more than the simple search, but it still falls short of what I think could be done. Not that they (google) don't know how to do it, but they rather keep it fast, and you won't get fast AND quality at the same time.

      So perhaps what we need is something to complement google. A slow, heavy_meta_data search engine that you can use to make complex queries.

      For example, I want to get all the pages that contain the term "Eclipse" as a link text AND within a

      (list) element, with at least 5 'incoming' links from distinct servers. And I should be able to provide it with a list of 'related' URLs (i.e. sun.java.com, developer.com) to push up the related context.

      A further step still, would be to tick a 'use synonyms' box, and the search engine would automatically search all the combinations of synonyms of each keyword. This is why I said, you can't have it fast.

      I seriously thought about distributed indexing back in 1999, and I'm glad I never implemented it. Some of the comments relating Grub are very good (i.e. prone to be poluted by tweaked clients). I'm now working on something related though. A framework for subject-specific web directories (like, mini-yahoos that anyone can produce), in open-source java. When I get it working it'll appear at jsite.org. These mini-directories would then share an API that could be combined into a single front-end (that's where megamap comes in). Still not pollution free, but the indexing clients are now a very select few.

  48. Altruistic? by sulli · · Score: 5, Funny
    That's the dumbest thing I've heard in ages. Why should I help out a for-profit company for free?

    (Oh, I can't remember. Have I MetaModerated Recently?)

    --

    sulli
    RTFJ.
    1. Re:Altruistic? by eversunsoft · · Score: 4, Insightful
      Well, because web searching, to this day in age, has been a free service. Supposing that the index is built as the result of donated searches, it would be ethically in very bad taste to act against this trend.

      Of course, I am the first one to question this trend. Has anyone else considered the possibility that one day we'll wake up, and notice that google is charging for access to it's basic searching services?

      I for one, would probably pay. I have become so dependent on it. What price? That's a good question...

    2. Re:Altruistic? by Anonymous Coward · · Score: 0

      Before you start pretending that you're so fucking clever, let me ask you: Did you ever contribute an entry to Gracenote (ne CDDB)?

    3. Re:Altruistic? by johnburton · · Score: 1

      Well why not? Is it better that your resources sit there idle helping nobody at all to do anything?

      --
      Sig is taking a break!
    4. Re:Altruistic? by R0 · · Score: 5, Funny

      Notice
      ======
      The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.


      I nominate "parasite".

    5. Re:Altruistic? by exhilaration · · Score: 1
      I nominate Phoenix!

    6. Re:Altruistic? by MikeDX · · Score: 1

      PIKACHU I choose YOU!

    7. Re:Altruistic? by stesch · · Score: 1

      Firebird seems to be a cool name.

    8. Re:Altruistic? by dirvish · · Score: 1

      Slashdot is a for profit company and you just helped them out by providing free (quality?) content.

  49. And in related news . . . by ubernostrum · · Score: 1, Redundant
    The architects of the GRand Unified Bootloader posted to the mozillazine forums today, flaming the choice of the name "grub" for this new system and calling for spamming of all grub-related discussion boards in retaliation.

    Or not. What a difference maturity makes.

  50. How about picking the types of content to crawl? by joejoejoejoe · · Score: 1

    I saw another poster say you can stop the GRUB client from crawling porn, but what if you could pick the types of content you wanted to crawl for?

    Let's say for example I use search engines but find them lacking or would like better results for the types of content I SEARCH FOR???

    So one solution would just be to pick the types of content manually, or select keywords, etc, manually....
    Another option might be to sniff my use of Google.com or Altavista.com (is that still up? ;) and then help the Engines refine the content in its indexes according to what I ACTUALLY SEARCHED FOR???

    Since there is not any monetary incentive to run the client, and you won't find any Aliens (but maybe some freaks ;), give the user (client) the ability to improve results for things that matter to them....

    --
    Silly Rabbit: tricks are for kids.
  51. Mod parent up! Funny! by Anonymous Coward · · Score: 0

    Haha

  52. Good Idea, Bad Implementation by oaf357 · · Score: 3, Insightful

    Yea. If you help Grub, Grub gives your web site a preferencial listing. Building the biggest search engine, sure. Building good search results, not so sure.

    1. Re:Good Idea, Bad Implementation by Anonymous Coward · · Score: 2, Insightful

      It doesn't give you a preference in listings, simply a preference in crawling. You offer some work to guarantee your site has fresh indexing. It's not much different than the search engines that sell frequent crawling for extra. A fresh non-relevant listing won't help you much more than an older listing.

  53. Alternate idea by gmuslera · · Score: 1
    Why not a proxy with a component that is a node of a distributed search engine?

    Something that the i.e. squid cache, and is some kind of client of that kind of network will be more useful, at least for common users (the ones that don't have yet a proxy cache will gain a lot in internet navigation, and will not use extra bandwidth, it will use just what they already downloaded) and for the "search" engine will give another approach of ranked results, giving more results for the sites that are more accessed, not just the ones that are more linked.

    It could have problems, of course. Sites not very visited will not be easy to found, making them even more difficult to find, but maybe this can be compensated with an optional crawler.

  54. What _is_ a good project? by bcrowell · · Score: 3, Interesting
    I have a FreeBSD server that wastes the vast majority of its CPU cycles (and most of its bandwidth, too). So what is a good distributed computing project to donate those cycles to? I'd like to find something that
    1. makes me feel warm and fuzzy about my altruism
    2. can run in the background on a Unix box
    3. is open-source (so I don't have to run someone's closed-source app on my box and trust their security through obscurity)
    Well, #1 rules out Grub, #2 rules out Folding@Home, and #3 rules out both SETI@Home and Folding@Home.

    So what worthy causes are out there?

    1. Re:What _is_ a good project? by valkraider · · Score: 1
    2. Re:What _is_ a good project? by metlin · · Score: 2, Interesting


      How about helping with some cool math prime search?

      ars Team Prime Rib - cool prime searching stuff.

      A mix of misc science stuff.

      dc projects - some Opensource, some not.

      And all projects at distributed.net come with source too.

    3. Re:What _is_ a good project? by Anonymous Coward · · Score: 0

      fuck #3, run the cancer one, up #1 no end.

    4. Re:What _is_ a good project? by Anonymous Coward · · Score: 0

      S@H is an open source project

      http://setiathome.berkeley.edu/setifuture.html#b oi nc

    5. Re:What _is_ a good project? by denny_d · · Score: 1

      I've been asking the same thing lately... the 'cancer' project, last I cked, didn't have a linux client... maybe it's time to come up with a distributed app. that anwers the question, "Why are the rich getting richer, the poorer getting poorer, and why do so few seem to care?"
      Don't mind my bleeding heart.

    6. Re:What _is_ a good project? by Anonymous Coward · · Score: 0

      just run a warez server biotch

    7. Re:What _is_ a good project? by smagruder · · Score: 1

      I'm doing SETI@home anyway. SETI is a trusted provider, and I'm not letting my strong devotion to OS get the best of me. SETI has made crystal clear their rationale behind closing their source, and I accept it.

      --
      Steve Magruder, Metro Foodist
    8. Re:What _is_ a good project? by shfted! · · Score: 1

      Read Rich Dad, Poor Dad for the answer to your question. Highly recommended.

      --
      He who laughs last is stuck in a time dilation bubble.
  55. DDoS by karlm · · Score: 3, Interesting
    So the idea is to DDoS the entire web? :-)

    If this thing gets too popular without proper throttling, they could cause real havoc.

    --
    Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.
  56. Legalities? by cheshiremackat · · Score: 4, Interesting

    Alright, I have 3 major problems with this...

    1) How different is this than the princton kiddies system? I don't know about you, but I don't want a 95 billion dollar bill arriving in the mail...

    2) What if you local (cache?) contains a few links to kiddie porn? Not your fault, right? Software does it's own thing, you cannot control, BUT what will the FBI think? The FBI Scottland Yard, RCMP are currently heavily investigating Kiddie Porn cases (good work IMHO), but what if your the unlucky sap who getts stuck with a few sketchy URLs? Or Worse Yet, what if this GRUB keeps a cache of the website like google does? Then what?

    3) What about material that is legal locally, but illegial somewhere else... eg. Nazi stuff in Germany, Falun Gong in China, etc... The last thing I want is to be refused to be given a travel visa cuz my PC has an illegial cache...

    Good idea in principle, but with sketchy content on the web, I don't think I will be the one keeping track of it all. If there is a way to filter out the questionable stuff then maybe, but since the purpose is to be as inclusive as possible, it seems incompatible.

    _CMK

    --
    Bad spellers of the world untie!
    1. Re:Legalities? by Anonymous Coward · · Score: 1, Informative

      A. I don't believe it caches anything except crc's for the url's. It downloads it, calculates the CRC, sees if it's updated, and it's gone. And, B. It doesn't download images or other media files, so no kiddie porn, unless it's text.

    2. Re:Legalities? by cheshiremackat · · Score: 1

      text is still illegal...

      And I don't want to point to any copywritten material... DMCA!

      --
      Bad spellers of the world untie!
    3. Re:Legalities? by SmartGamer · · Score: 2, Interesting

      It does, however, download a buffer of URLS to scan. If your buffer was less than clean when your computer gets searched, oops, you're in trouble...

      Not to mention the fact that it still goes and hits all those sites, and with the government trying to smash that little thing we call "privacy," anything questionable will likely go on your permanent record- the one that doesn't exist, but they somehow have anyway.

      --
      Warning: Poster of this comment is a nerd. Just like everybody else here.
    4. Re:Legalities? by amoe · · Score: 2, Interesting
      text is still illegal...

      Text child pornography is illegal? How does that work? I thought the rationale for video child porn being illegal was that an illegal act had been committed in its creation - how do they justify making something illegal that is purely the product of an author's imagination?

      Disclaimer: I have never read a child porn story, but I have seen them around the seedier places on the net.

      --
      You look beautiful! Incidentally, my favourite artist is Picasso.
    5. Re:Legalities? by Anonymous Coward · · Score: 0

      If it were in an actual buffer for the program, you'd simply point out what the program is, how it works, and your lack of control over what it does. You call in grub/Looksmart's people, they agree, and it gets tossed out.

    6. Re:Legalities? by gozar · · Score: 1

      At least in Ohio you can be jailed for text child porn.

      --
      What, me worry?
    7. Re:Legalities? by cheshiremackat · · Score: 1

      Yeah... after your name is published in the New York Times as posessing child porn... remember the court of public opinion is a very scary place...

      Richard Jewel anyone?

      _CMK

      --
      Bad spellers of the world untie!
    8. Re:Legalities? by turkeyphant · · Score: 1

      Disclaimer: I have never read a child porn story, but I have seen them around the seedier places on the net.

      You realise this only affirms your guilt, right?

      Too bad you don't live in Ohio...

  57. Re:Search engine software and lack of A . I . by Anonymous Coward · · Score: 0

    Hey! Have you heard of Yahoo?

  58. Re:search.msn.com is the future by Anonymous Coward · · Score: 0

    Yeah, but all the others actually run linux and can't stay up long enough to get indexed.

  59. Re:Search engine software and lack of A . I . by Anonymous Coward · · Score: 0

    A) Google does have a human-created directory (might be the same as DMOZ)

    B) I imagine that they manually have given pages on Yahoo and other web directories high weights.

  60. Unlimited Use? Try Wishful Thinking. by NeoMoose · · Score: 3, Insightful

    You can always use the Google API for more than 2,000 searches per day if you pay licensing fees for it. That's just Google ensuring that it can remain a viable company. Little text-box advertisements just don't cut it in this day and age where blatant pop-ups and colorful banner ads don't even have much turn-around. That's not the point though.

    The point is that I wouldn't look anytime soon for LookSmart to allow unlimited usage of this API. It's too large of a project for them to just let people use it. It's simple economics. They may not be investing the computing resources into this projects web spidering software, but it's still using TONS of resources to keep this data catalogued and readily accessible.

  61. Re:Search engine software and lack of A . I . by Anonymous Coward · · Score: 0

    Or DMOZ (which Google actuall does use.)

  62. The open faucet, not the blown dam by SmartGamer · · Score: 2, Informative

    A DDoS is only effective because it's a whole bunch of messages all at once to one target- in the 100,000,000 range for a full-scale attack, to always cover all the positions.

    The database of "check-me"s is randomized rather evenly. Even if this takes off, I don't see how it could really do serious damage to any but the truly dinky servers: the hits will not come in all at once and flood the whole connection. While it very well could end up a constant stream, it's unlikely to be the massive stream that makes a DDoS.

    It does have the potential to slow servers across the world, but that's okay- it will slow home users' connections across the world by using 1/4 of them, too, so nobody will actually notice.

    --
    Warning: Poster of this comment is a nerd. Just like everybody else here.
    1. Re:The open faucet, not the blown dam by smagruder · · Score: 1

      A DDoS is only effective because it's a whole bunch of messages all at once to one target...

      Well, no. Many hosted web sites have bandwidth limits entailed in the packages. If Grub makes the bandwidth limits tip over, then that's an effective DOS.

      --
      Steve Magruder, Metro Foodist
  63. the backstory by eidechse · · Score: 1
  64. Re:search.msn.com is the future by lamber45 · · Score: 2, Interesting
    I followed one of these links and looked at the MSDN article. It's full of generalizations taken from 20-year-old UNIX textbooks, although Linux and X windows are mentioned here and there. Apparently recent versions of some level of Windows have an "Interix" subsystem. I've used Cygwin32 on Win95, WinME, Win2k and WinNT, and Borland C++, and Visual C++ .NET, but I don't think I've ever used the Microsoft native POSIX layer. The article gives a lot of questions that should be asked before starting a migration like this. One possible reason to migrate is to decrease the Total Cost of Ownership; another is to increase hardware options and move away from proprietary systems!

    Another quote I like is, "Windows operating systems do not provide X Windows. For X Windows connectivity, developers need a third-party X Windows server.". Of course Microsoft would never be anticompetitive by competing with third-party suppliers of implementations of an open standard, right?

  65. Re:search.msn.com is the future by Anonymous Coward · · Score: 2, Insightful

    It's not as bad as you make it out to be. They do point out (in fine print) that it is a "featured" site. They list the "featured" sites first, then the sponsored links, and then general web hits. And they mark each category. I guess that the only differencebetween featured and sponsored is in the price. All this was far from obvious to me when I saw the results at first (being used to Google), but I imagine that if you used them on a daily basis you would quickly become used to skipping down to the real results.

  66. Ah, just what we need by Moonwick · · Score: 1

    Another damn web spider adding to the collective noise of the internet.

    Why don't these people try to work out some way of sharing information so I don't have to have my webserver poked at by every person and their brother's search engine?

    --
    Only on slashdot can a posting be rated "Score -1, Insightful".
  67. Read the fine print by anon*127.0.0.1 · · Score: 2, Insightful

    It's a "featured site". Meaning it's a site from Microsoft, a Microsoft partner, or someone who paid some money to Microsoft for the privilege.

    Nothing that other search sites don't do. They just mark their paid adverts a little more obviously.

    --
    I am NOT a man!
    I am a free number!
  68. robots.txt by Anonymous Coward · · Score: 0

    My web site has gotten a few hits from the grub bots, none of which were for robots.txt.

    Hello grub, welcome to my BANNED BOT LIST.

  69. What about the source? by PhrostyMcByte · · Score: 1

    Okay, i found the source at sourceforge CVS. unfortunately, all the files checked in are >4 months old. If this is under the GPL, where the hell is the source for the binaries they are putting out?

  70. Re:search.msn.com is the future by resin8 · · Score: 1

    Results 801 - 878 of about 58,500,000
    In order to show you the most relevant results, we have omitted some entries very similar to the 878 already displayed.

    609 pages with Linux info isn't so bad, when you consider Google only shows 878 "relevant pages". Not one link to MSN in those 878 pages.
    Anyone care to look through the 58,499,222 omitted entries?

  71. Re:Unlimited Use? Try Wishful Thinking. by dmoynihan · · Score: 1
    Little text-box advertisements just don't cut it in this day and age where blatant pop-ups and colorful banner ads don't even have much turn-around.

    This I dispute sir. Targeted keywords on google, where my clickthrough ratio has averaged 1.3-1.5%, are a goldmine for my site and money very well-spent (averaging $500 a month on those ads, paying .05 in 97% of all cases.)

    I've been a google advertiser since Feb. 02, consider their program extremely lucrative, and I guess they like me 'cause I got a picture frame from them last Christmas. It was a Coach picture frame....

  72. Re:Unlimited Use? Try Wishful Thinking. by NeoMoose · · Score: 1

    I'm not disputing whether or not the advertising is effective in fulfilling its purpose of promoting the advertiser's site. I am simply stating that Google would not a very viable company if they relied on advertising alone to make their money.

    I won't argue with you on how much Google makes off the ads, as I am willing to bet that about 80% or more of their funds comes from advertising, however, advertising has always proven as an ineffective means of remaining viable. You simply have to have other sources of income.

  73. Whatever by Maxamoto · · Score: 0

    What a lame piece of shit code... Didn't work at all on any of the 5 machines I tried it on... Tried logging in to the forum, but unfortunately that was broken too. Basically, that's OSS in a nutshell.

    --
    "Your CPU came with a keyboard? What kind of ghetto deal is that?" -McSuede
    1. Re:Whatever by wheany · · Score: 1

      And I was ready to try the client, but it wouldn't accept my email-address, because it has a "+" sign in it, and I couldn't find a contact address where I could have reported the problem.

  74. The Web as a Catalog by X-wes · · Score: 1
    GNU Grub bootloader, who's executable is named "grub".

    Im sure you'r apostrophe's and ",quotes", have good grammars

  75. I don't think breadth is what is needed by Anonymous Coward · · Score: 0

    We need a higher signal to noise ratio. I don't think crawling MORE will really help that. Even google has been fooled and sites can quickly master the rankings with little effort.

    I'm not sure of a good way to improve the signal to noise ratio, but this certainly doesn't seem like a solution. I also question the ethics of releasing such software that, if it contains security holes, is a potential launch platform for debilitating internet attacks.

  76. Re:search.msn.com is the future by The+Cydonian · · Score: 1
    I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

    Yes, Google's algo only asked Microsoft to go to hell, of course, taking it down after the story was reported far and wide.

  77. The approach is inherently flawed by oren · · Score: 3, Interesting

    It is too easy to send currupted information into the database. They have *no choice* but to trust the clients. Sure they could run spot checks on the results, but they would be very partial and it would be easy enough to fake responses for those as well.

    So the more popular it gets, the more incentive people will have to promote their sites by feeding it fake index information. If this magically got to be very popular, within weeks search results would become meaningelss and it would drop back into obscurity. The more likely result would be that it will never become popular in the first place.

    Besides, who wants to donate his CPU and bandwidth resources for a commercial company, anyway?

    1. Re:The approach is inherently flawed by UnknownQ · · Score: 1
      It is too easy to send currupted information into the database. They have *no choice* but to trust the clients.

      Not really, if they follow the typical distributed computing model they give you a chunk of the web, and the chances that out of the whole web they give you part you are interested in tweaking is very low. The only reason to mess with results is out of pure malice.

      Also it would be pretty easy to put a report link url if cnn.com is only links to joe blow's web site. With any luck they aren't doing searches on a link based algorithm anyway.

      --
      Wherever you go, there you are!
  78. The internet has become, by nycheetah · · Score: 1

    The internet has become, an ever growing tree of knowledge that will some lead to something even bigger.

  79. Old & Rusty by sICE · · Score: 1

    nothing about grub here, but personally i really like this web site that have a few search engines on it: http://freddo.netfirms.com/. It also refers to Fravia's new website and his invaluable forum.

    A good reference about search engines is also Search Engine Watch

    have fun...

  80. Just terrific. A massively powerful DDOS tool. by NerveGas · · Score: 1


    Normally, most search engine's spidering methods are designed to be pretty nice to servers - such as only requesting pages once every 30 seconds or so.

    However, I've seen times when the methods of some of the search engine spiders were foiled by such simple things as having a large number of virtual hosts on a machine. Combine that with a number of front-end machines all connected to the same database server, and things can get really nasty.

    In one particularly bad incident, several fairly big-name search engines were spidering us simultaneously, and only hitting each domain name relatively infrequently. However, with 500+ on several front-end servers, and several search engines, we were getting something like 50-100 requests per *second* from the search engines. When those hits were to pages generated from the database, our servers kept up, but performance was definitely degraded.

    So, where am I going? I see the potential for small bugs, weak algorithms, idiotic end-users, or even malicious end-users causing the same sort of havoc. Even if it weren't meant as an actual DDOS, it could certainly end up that way. And it would be much, much harder to prevent than merely blocking (or rate-limitting) requests from one company's spiders.

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
  81. The have cracked it by fireman+sam · · Score: 2, Funny

    1. Design a search engine
    2. Let everyone else fill it
    3. Profit

    The second step is finally found!!! YAY

    --
    it is only after a long journey that you know the strength of the horse.
  82. Grub does NOT look for robots.txt by MythosTraecer · · Score: 1

    I'm sure grub will indeed build a larger database than most other search engines, since grub (or grub-client, or whatever it's calling itself) has never, not even once bothered to look at a robots.txt file on any web site I've ever administered. This is what webmasters call a misbehaved robot, and it is not something to be looked at with respect.

    --

    --Mythos
    1. Re:Grub does NOT look for robots.txt by Anonymous Coward · · Score: 3, Informative

      Here it is on mine requesting it:

      64.241.242.18 - - [18/Mar/2003:17:25:30 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)"
      64.241.242.18 - - [19/Mar/2003:19:41:05 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)"
      64.241.243.81 - - [30/Mar/2003:22:10:41 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)"
      64.241.243.81 - - [01/Apr/2003:23:11:21 -0700] "GET /robots.txt HTTP/1.1" 200 223 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)"

      Notice those are LookSmart owned ip's and not just normal user crawlers. They seem to centrally crawl for robots.txt. They do know, however, that they need to crawl for robots.txt more often.

    2. Re:Grub does NOT look for robots.txt by Kentrosaurus · · Score: 1

      I haven't seen them hit my robots.txt in the last logrotate term, but it's been on the disallow / for at least a month and I'm still flooded by their mindless drones.

  83. Re:Yeah what's the corporation bashing? by knowledgepeacewi · · Score: 0, Offtopic

    Many people around here work for the Government.

    As to whats wrong with Corporations and Big Business:
    in one word: Enron

    In more words that that:
    http://www.corpwatch.org/

  84. Linkloader by Anonymous Coward · · Score: 0

    www.linkloader.com

  85. CPU cycles are NOT wasted or "available" by pe1chl · · Score: 2, Insightful

    The common point made by these "distributed" software authors is that there are "wasted" CPU cycles in your computer that you could donate to a project for free.
    However, that is not true at all! CPU cycles are not wasted. When the CPU has nothing to do, it sleeps. At least in a modern operating system (i.e. about everything after Windows 95).

    By "donating your wasted CPU cycles" you will actually increase the power consumption of your computer. This will be very noticable in a laptop, but when you watch the CPU temperature in your home system you will also see a noticable increase in temperature between an idle system and a system running a computationally intensive background task.

    Probably the effect will be worse for things like keysearches, prime number searches, SETI etc than for this GRUB bot, because that probably also spends time waiting for the network (and thus returns the CPU to idle).

    So before you "donate your wasted CPU cycles", please realize that this will actually cost you money.

    1. Re:CPU cycles are NOT wasted or "available" by The_Big_Red_Dog · · Score: 1

      But it isn't so much CPU cycles with Grub as much as it is bandwidth. Many users don't understand that many of them share bandwidth and don't really have extra to spare.

  86. My first Grub hit coming over to my site by presroi · · Score: 1
    $IP - - [05/Apr/2002:12:27:55 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 404 218 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
    So, this was last year.... Is this a dupe?
    1. Re:My first Grub hit coming over to my site by caluml · · Score: 1
      So, this was last year.

      Warning, your system clock is 1 year out of date.
      [root@presroi.de root]# ntpdate ntp.demon.co.uk 20 Apr 11:39:09 ntpdate[23473]: adjust time server 158.152.1.76 offset 1284989826352.108067 sec
      Thankyou ;)

    2. Re:My first Grub hit coming over to my site by presroi · · Score: 1
      My system clock is *not* 1 year out of date.

      this is a grep over my mylogfile.txt for 'grup.org' from Feb 2002 to Feb 2003.

      natlb4.webmailer.de - - [05/Apr/2002:12:27:55 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 404 218 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      192.67.198.230 - - [14/Apr/2002:21:05:36 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb4.webmailer.de - - [19/Apr/2002:06:49:57 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb7.webmailer.de - - [23/Apr/2002:02:15:47 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      192.67.198.227 - - [27/Apr/2002:04:46:29 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      192.67.198.228 - - [02/May/2002:15:19:01 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb8.webmailer.de - - [03/May/2002:21:16:14 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb4.webmailer.de - - [13/May/2002:16:31:32 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb5.webmailer.de - - [22/May/2002:09:57:57 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb7.webmailer.de - - [30/May/2002:05:48:19 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb4.webmailer.de - - [17/Jun/2002:21:09:39 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb4.webmailer.de - - [01/Jul/2002:19:19:34 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb3.webmailer.de - - [12/Jul/2002:01:07:25 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      192.67.198.231 - - [28/Jul/2002:15:33:37 +0200] "GET /methoden/hanf/robots.txt HTTP/1.1" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb4.webmailer.de - - [29/Jul/2002:15:19:33 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      192.67.198.227 - - [14/Aug/2002:18:22:12 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb7.webmailer.de - - [31/Aug/2002:01:26:59 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb8.webmailer.de - - [14/Sep/2002:07:46:27 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"
      natlb7.webmailer.de - - [29/Sep/2002:01:39:25 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with h

    3. Re:My first Grub hit coming over to my site by caluml · · Score: 1

      Lol, relax guy, I was making a joke :)

  87. Re:How about picking the types of content to crawl by wheany · · Score: 1

    I like the idea. But it shuld benefit the community as well, so it should crawl something like 80% community assigned pages, and 20% "my" pages. That would still benefit the user much more than he deserves.

  88. Sig...Tony Blair by knowledgepeacewi · · Score: 1

    Holy fuck Tony Blair, what the HELL are you doing?
    Ensuring that American Dollars and Popular Opinion flow toward Britain. Not to mention military toys and training for British troops.

    Brilliant of him to pick the winning side. Now he can reap the rewards for his people.

    1. Re:Sig...Tony Blair by Anonymous Coward · · Score: 0
      (Offtopic so AC)

      And this is worth the costs exactly how?

      Remember, Tony Blair's actions are:

      • Encouraging the destruction of the United Nations, a body that has been fairly effective at keeping the peace for the last 50 years or so
      • Creating huge divisions in the EU which threaten to create natural enemies amongst Britain's most important trading partners (clue: US is 3000 miles away from Britain, France is 20 minutes on the Chunnel) and threatening the UK's position as a powerful force in an EU which inevitably will grow stronger and more influential with time.
      • Causing the deaths of thousands of innocents (obvious, but needs to be mentioned)
      • Encouraging countries to deal with disputes militarily when clear, non-violent, solutions exist
      • Causing mayhem in the Middle East, that'll long term result in decades of extreme terrorism directed at the UK and US. 9/11 was because of a few US troops in Saudi Arabia for fuck's sake, what the hell is having an entire country invaded and occupied going to do?
      • Siding with an administration in America that clearly opposes everything the UK stands for, and which has no problems ignoring UK concerns even at this time when there ought to be some gratitude being expressed from that government
      Siding the winning side isn't always a good thing. Britain could have sided with Germany in 1939, or with Russia in 1945. Thank God we didn't. And while I'd be hesitant to put the US government in the same category domestically as those two dictatorships, right now their foriegn policies seem to be indistinguishable from these warmongers. That's frightening, and we'll be paying the price - on both sides of the Atlantic - for many years to come.
    2. Re:Sig...Tony Blair by Anonymous Coward · · Score: 0

      (Offtopic so AC)

      And this is worth the costs exactly how?
      Because, after the people of America choose a democrat for president in retaliation for these avoidable political blunder(s), the world will breathe a sigh of relief, blame the "Bush Regime" and go back to admiring America for all of its cool innovations, movies, music, and technology (Many of which came out of our military research).

      Then the Democrat will again pay off another Republican Deficit and get along with the world, whose people, as a factor of human nature in good times, have poor memories.

      The UN has done very little to keep the peace these long years. Peace has always been a factor of whats in the interest of the aristocracy of the two super powers, USA and USSR. Without their acceptance of UN policies, the UN wouldn't have any power. So, by not bowing to the will of the only remaining superpower, the UN shot itself in the foot and lost any authority it had gained over the world by failing to support an inevitable action by this horrible president. France demonstrated that, like so often in the last 70 years, it was only good for these thing(s): mouthing off against the person carrying the big stick, choosing the losing side, and being ineffective politically. Did I mention destroying world unity?

      The "United States of Europe" or "EU" will never have much power over the individual countries that compose her. The huge gap in languages, and the power reserved by the member countries will cause rifts that will make the EU (otherwise a noble effort) as powerless as the UN. Without a more centralized government, and people saying, "I belong to the EU first, Spain second." the EU will be as powerless as the U.S.A was when it was an confederation under the Articles of Confederation. Until the EU becomes a federation, you need not care about what it has to say.

      Causing the deaths of thousands of innocents.
      While I didn't approve of the war, and I don't approve of the deals going on behind closed doors in Washingtion over either Afganistan or Iraq, the people of Iraq, when President Hussein was overthrown (for certain this time), did dance in the streets of Baghdad, overjoyed. They felt the pain and suffering of losing family and loved ones for 30 years, and they were still happy.

      Encouraging Superpowers to deal with disputes miltarily when the UN had dropped the ball in 15 years of impotent inaction. (I'm of the opinion that Iraq was contained and it only became an issue when Bush made it an issue, but the UN did drop the ball and Iraq wasn't a good place.)

      The middle east has been in turmoil for 2000 years. Crusades, Jihads, "Holy Land". There is nothing, short of nuking Jerusalem, that will solve these battles, and the Bush Regime has shown its lack of Historical Understanding by trying to get involved and "once and for all" provide stability and order in a region of religious fanaticism.

      Siding with an American administration...
      Neither of us have the whole picture. Chances are very good that PM Blair (who has not been removed by his people in a vote of No Confidence) has made a number of deals with Bush that will benefit his people tremendously over the long haul. Unlike my current president, Prime Minister Blair seems to have the good will of his people at heart and I'd gladly see him running my country.

  89. Re:Unlimited Use? Try Wishful Thinking. by dmoynihan · · Score: 1

    You're certainly right that every business should have other sources of income (I do worry about my own site's single source). But I think google's raking it in on the click-through ads.

    Typically, where I advertise, there are eight or nine other people trying for the same keyword. I've got the green-shifted look despite paying the minimum because I'm allowed to include "free" in my description, but there's usually five people above me, meaning they're paying at least six cents; often as much as .40 cents per click, on keywords that generate around 500,000 impressions a month.

    That number really starts to add up when you think of all the web businesses, and all the keywords, and all the searches, and all the clicks, but I guess we won't have a better idea until google files with the SEC prior to their IPO...

    One thought, however, is the way google text ads are now showing at places like Metafilter or a number of the PDA news sites. Google's out to score more impressions any way they can... must be worth something to them.

  90. Re:search.msn.com is the future by pafrusurewa · · Score: 2, Funny

    The Austrian version of MSN is even better. If you search for Linux, the first two results are WinXP ads on the Microsoft site. And, while you're at it, try searching for google or yahoo. This will produce a popup saying "Why look for a search engine when you've already found one?".

  91. Distributed Crawlers by Anonymous Coward · · Score: 0
    What a great idea Grub seems until you peel back a layer or two. Remarkable promise? You decide.

    As it is not possible to really track who is an authentic Grub, the agent is subject to abuse. If I were a spammer (I am not) and wanted to do email address gathering, I would use the agent name for any spam crawlers I would run... I do not think I am giving away any dark secrets here.

    Since the makers of Grub claim it is a robot.txt compliant spider and since I have seen the Grub agent not always follow the behavior a robot.txt compliant spider should follow, I can only conclude that my speculation above is already happening.

    Imagine your horror when you think your site is the most well indexed on the planet, because it is being crawled at DoS frequencies, but it fact it is really being crawled by an army of spambots getting in as something you might actually want to have crawl your site. As you watch those logs, consider that those Grub accesses may be the spambot engine from hell and not the real Grub rummaging through your site.

    So, even if you were of a mind to discriminate between the real thing and the pretenders, how do you do it? Search engines from fixed IP blocks are easy enough to authenticate and really allow a webmaster control over who indexes what. I cannot think of a way to do that with a distributed spider application without having a way to communicate with the "mother ship". If I have to make the investment in time to communicate with the "mother ship" each time a Grub client shows up, I might just as well crawl my own sites and disallow external Grub clients anyway.

    Elitist dirt bag? Me? Not really, blocking and managing search engines is a fair approach, because if you let every budding search engine / indexing tool on the planet have at your site, you really might be facing a defacto DoS just from this kind of activity alone.

    Well behaved search engines always are allowed and welcome to crawl our sites. Ill behaved engines are met with a different attitude and fate.

    "We reserve the right to serve or refuse service to anyone for any reason"

  92. Re:Unlimited Use? Try Wishful Thinking. by NoOneInParticular · · Score: 1
    Where do I pay these license fees? The only thing I can find is this.

    In any case, a colloborative search engine API using distributed computing might still be a nice thing for not-for-profit purposes. One of the applications I wanted to use this API for was be a plagiarism search for teachers to quickly scan student papers to see if they were simply pulled of the net. This was bombed by the 1000 query limit of Google's api, as to do the search properly would require a few tens of queries for each paper. If you have to check tens of these papers the limit can be reached fairly soon.

    For this purpose speed wouldn't be so much of an issue, so maybe a distributed cataloguing (sp) and search system might be something interesting?

  93. Not so much worse than Kazaa by jmping · · Score: 1

    When you download Kazaa, you authorize the corporation to utilize any unused processor or disk space -- this doesn't seem that much more dangerous than all those Kazaa users out there. As a non-Kazaa subscriber, I think I will also skip on grub -- I paid for my computing space and power thank you, and I don't plan on just giving it away to all of these corporations looking to further themselves.

    --
    **When craziness is bliss, 'tis folly to be sane**
  94. But you get to hide your surfing habits by Wee · · Score: 1
    Grub gives you something else: they hide your surfing habits.

    The only way I'd run grub is on a low-bid DMZ host (like that old P133 I have laying around), with the adult content searching filters disabled. Then I'd let it do whatever it wanted to do as long as it wanted to do it and I'd forget about it. Who cares about the search results? Just use Google like before. They aren't going to make a good search engine anyway.

    But if I ever got a subpoena which included information about my web browsing and online history, I could tell the judge that I could't honestly say if that particular bit of outbound traffic was me or that grub thing doing its searching. So as long as I was running it, I'd be free to look at "subversive" literature, pr0n, Arab websites, the Cato Institute's homepage, whatever I wanted. If I got on a list and they tried to PATRIOT ACT me, I'd use grub as my get out of (Ashcroft's mystery) jail free card. Hell, I'd throw grub and freenet on the same box and cover every base.

    That's if I was paranoid. And wanted to surf Arab web sites or pr0n. Which I'm not. And I don't. :-)

    -B

    --

    Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

  95. Nice company name by Anonymous Coward · · Score: 0

    "Looksmart"?? Is this supposed to be a clever company name?

    I keep thinking "Look smart as opposed to what? Being smart?" or "Look smart even if you're not??"

    With a name like that, these clowns don't even LOOK smart!

  96. some mails i had with kord... by tarzeau · · Score: 0

    make up your own mind: www.linuks.mine.nu/people/kord/

    --
    Windoze not found: (C)heer, (P)arty or (D)ance
  97. Distributed Crawling From Browsers by txtger · · Score: 2, Interesting

    It would be interested to just see a database that is connected to browsers, so that whenever I were to look at a page, the page data would be processed and sent to whatever search engine. Then, those sites that are updated frequently and get a lot of traffic would be more easily searched.

    Just a thought.

    1. Re:Distributed Crawling From Browsers by denny_d · · Score: 1

      Sounds like a fine RFE for Mozilla. They'd be the ones to do it right without planting some nasty stuff inside. I think I'll go do that now...

  98. Some hacking required... by SharpFang · · Score: 1

    Ok, so I'll just hack it a bit, and all my websites will FINALLY make it to #1 in search engines on ANY keyword! Doh, I need to subscribe to a few click-to-pay banner sites...

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  99. Re:Search engine software and lack of A . I . by jafiwam · · Score: 1

    What? Are you part of the "Yahoo Publicity Spread FUD department strike team" or something?

    Here's a hint for ya;

    1) Go to Google

    2) Click on the Fourth Link from the left in the bar. (The green one that says "Directory" on it.)

    3) Enjoy!

    Or, if you are particularly patient, just visit http://www.dmoz.org/ directly.

    Built by humans, edited by humans, unpaid volunteers that know something and care about the directories they edit. You too can even volunteer to help!

    Yahoo sucks PRECICELY BECAUSE they tried to pay people to get sites in their directory, found out they could not keep up, and then started making site owners pay to get in. Obviously, GRUB won't do what you want either, but what you are complaining about lacking already exists.

  100. read the grub forums by denny_d · · Score: 1

    The idea is cool and I imagine it won't be long before an org. without links (unverified) to M$, will do the same thing. There's at least a couple of people on the grub forum who are figuring out some of the shadier sides of this code: potential spyware? security hole? And the licensing is vague (no links).
    Note the tone of their pitch as well you are participating in a competitive group effort a kin to Seti@home and Distributed Net? I don't think so... caveat emptor.

  101. Diminishing marginal returns by squashed · · Score: 1

    Updating a search engine of general web material is an important objective, but there are diminishing marginal returns to immediacy. Google News is an example of a subset of web material -- news sites -- for which immediacy is a more important goal. It's no surprise that Google offers a very fast refresh there. A distributed system that would do that for the entire net is interesting, but not necessarily worthwhile.

  102. searching for porn by maluke · · Score: 1

    i don't know about you.
    i don't search for porn, it looks more like porn searches for me.

  103. So THAT'S What It Is... by suwain_2 · · Score: 1

    I've been noticing some hits from my website mentioning something called "grub," but never knew what it was.

    For the webmasters out there, this is what the UserAgent string shows up as on my site:

    Mozilla/4.0 (compatible; grub-client-1.2.1; Crawl your own stuff with http://grub.org)

    (There are variations on the grub-client-1.2.1 version number, so if you for some reason decide to search, you may want to do grub-client-*.

    --
    ________________________________________________
    suwain_2 :: quality slashdot p
    1. Re:So THAT'S What It Is... by Anonymous Coward · · Score: 1, Insightful

      Was it really so hard to go to the url in the user agent to see what it was?

  104. Help Microsoft beat Google? by Anonymous Coward · · Score: 0



    From the CNET article (linked from the grub website)

    "LookSmart, which licenses editorial and commercial directory listings to Microsoft's MSN and other Web sites, paid $1.3 million in cash and stock for Grub, according to a recent filing with the Securities and Exchange Commission. LookSmart said it is testing the Grub system and plans to unveil the distributed computing project in early April."

    Does this mean whoever runs the client will be helping Microsoft build a "good" search engine? It appears to me that will be the case. Also, "the client is open source"? Oh great - so we can do all the labor and look at the source of it, but the server which the corporations (Looksmart? M$?) owns and servers will not be open source.

    Isn't this a dirty trick on the open source community?

    Has anyone been able to get/look at the source code of the client yet?

    There is no mention on their home page (www.grub.org)
    on who these guys are and what their intention is. Just a .ORG name for giving the public a good feel?

    [Also there is a thread on their forum on the client trying to act as a server - the thread is inconclusive on whether any spyware is included.]

  105. See by Anonymous Coward · · Score: 0

    How often the dupes are getting! Pretty soon it will be once a week..

  106. Features... by Hudjakov · · Score: 0

    Is it slashdottable?

  107. Gurb does not follow robots.txt correctly! by sharph · · Score: 1

    Why is it looking for robots.txt in a subdirectory?

    1. Re:Gurb does not follow robots.txt correctly! by presroi · · Score: 1

      /methoden/hanf equals www.hanfbroschuere.de

      the posted IP is not the IP of the grub client but of the somewhat strange ISP tool :)

  108. Re:search.msn.com is the future by muzthe42nd · · Score: 1

    i couldn't believe that, so i tried. While my german isn't that great, i worked it out. that is hilarious

    --
    Pfft - Sorry, what?
  109. And where does government get their money? by Anonymous Coward · · Score: 0

    From taxes. Taxes on businesses, and taxes on people who work for businesses. No taxes means no money to pay these government whiners.

    Do people actually think that there's some magical money tree, and government just gets it from there? If one dislikes business so much, BAN IT! Make it illegal. We'll go to 100% socialism. Let's see how THAT works out.

  110. hair is raising on the back of my neck by malia8888 · · Score: 2, Interesting

    Uh huh, Grub is going to "run in the background" ?
    No thanks!!. It just doesn't feel right. It is sort of like lending a firearm to an untrustworthy neighbor. What is in it for the lender other than potential problems?

    Spyware "runs in the background" and slows up peoples machines. What really happens to one's machine performance with Grub? And, more importantly, where is my check?

    --
    Harpo Tunnel Syndrome--my wrist feels funny.
  111. Re:Matt Well's take by Anonymous Coward · · Score: 0

    Your take sounds exactly like what Matt Wells, the programmer of the Gigablast search engine said on his rants and raves page.

    "Rants & Raves
    by Matt Wells

    My Take on Looksmart's Grub
    Apr 19, 2003

    There's been some press about Grub, a program from Looksmart which you install on your machine to help Looksmart spider the web. Looksmart is only using Grub to save on their bandwidth. Essentially Grub just compresses web pages before sending them to Looksmart's indexer thus reducing the bandwidth they have to pay for by a factor of 5 or so. The same thing could be accomplished through a proxy which compresses web pages. Eventually, once the HTTP mime standard for requesting compressed web pages is better supported by web servers, Grub will not be necessary." ....

    [Your suppossed take:] "Looksmart is only using Grub to save on their bandwidth. Essentially Grub just compresses web pages before sending them to Looksmart's indexer thus reducing the bandwidth they have to pay for by a factor of 5 or so. The same thing could be accomplished through a proxy which compresses web pages. Eventually, once the HTTP mime standard for requesting compressed web pages is better supported by web servers, Grub will not be necessary."

    You have been caught plagiarizing. Dork, you hardly changed a word, nice copy and paste job.

  112. Re:Unlimited Use? Try Wishful Thinking. by NeoMoose · · Score: 1
    1. You can develop any application you want, but you must abide by the Google Web APIs terms of service. One condition is you cannot create a commercial service using Google Web APIs without first obtaining written consent from Google. Another is that you can only create one account for your personal use.
    Do what it says - obtain written permission. That written permission will be in the form of a commercial contract/license.
  113. Cool, but I won't get my hopes up by whereiswaldo · · Score: 1


    If this becomes popular, legal issues will crop up and it will be shut down and banned through mega-corporations' legal clout. I hope not, but I wouldn't be surprised at all. Today's net kinda sucks.

  114. It's an Important Step by serutan · · Score: 1

    So Grub is commercial. Big deal. Any large-scale project like this furthers our knowledge of distributed computing and helps pave the way to other things, like on-demand mirroring of popular content.

  115. Re:Yeah what's the corporation bashing? by hesiod · · Score: 1

    > As to whats wrong with Corporations and Big Business: in one word: Enron

    Wow, try using a little more thought next time. There is nothing wrong with business. What was wrong with Enron was the people running it (or, not running it). There are plenty of big businesses out there that are not corrupted like Enron. Stop beating your chest with stupid remarks that don't hold up.

  116. Re:Unlimited Use? Try Wishful Thinking. by Anonymous Coward · · Score: 0

    Google has other income sources. Take a look at their Google devices 1U unit costs $28,000! It's really just a Dual PIV with 2 GB RAM running a modified Linux version and their software. There is also a limit of number of docs you can index.
    For $1,000 you can have a kick ass open source search engine on the same hardware that you can actually customize and disk space is pretty much your only restriction. See ht://dig project @ http://htdig.org