Building a Bigger Search Engine

Will Grub take off or be smashed? by Blaine+Hilton · 2003-04-19 14:17 · Score: 4, Insightful

I started to use grub, but then questions started cropping up. First we are using this to further a commercial organization. This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine. There is not even any potential reward such as with distributed.net.

Also the grub engine crawls everything, including adult content and other questionable content. They have a setting to turn it off, but it does not block it. With the current questioning of international law relating to accessing illegal websites this could have major consequences for the average user.

So for the time being I have stopped using the grub client until some serious questions are answered. It's an interesting concept and if it was being used in more of an academic setting it could be interesting. However I believe that search engines like Google are doing pretty good themselves.

Go calculate something

Re:Will Grub take off or be smashed? by dubiousmike · 2003-04-19 14:20 · Score: 1

on thing might be that a site doesn't have to wait 6 weeks to get listed...

is that good or bad?
Re:Will Grub take off or be smashed? by Threni · 2003-04-19 15:03 · Score: 1, Insightful

"Also the grub engine crawls everything, including adult content and other questionable content."

Adult content isn't questionable. You either look at it, or you don't. Don't tell me that stuff about children being harmed by looking of photographs of the naked body has got to you?

Also, the legal problems exist mainly in your head. No user will be prosecuted for supplying an URL of a website to a third party who then makes it available to people using their search engine, as it simply isn't illegal.

Unlike SETI, this thing isn't a complete and utter waste of time, although I agree with you about the folding thing.

"So for the time being I have stopped using the grub client until some serious questions are answered."

No serious questions have been posed at this time.
Re:Will Grub take off or be smashed? by bcrowell · 2003-04-19 15:59 · Score: 3, Insightful

This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine.
Actually, if I had a gun to my head, I'd choose to run Grub, because the client is open-source. I used to run SETI@home, but then the news came out that they'd been sitting on a potential root vulnerability for a long time. That really brought home to me the risks of running someone else's closed-source app on my box.

--
Find free books.
Re:Will Grub take off or be smashed? by kaden · 2003-04-19 16:00 · Score: 5, Insightful

Um, I think you're missing the point. This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server. I would quite possibly be arrested and charged, and while I wouldn't be convicted, it's quite an ordeal, and there's an ugly social stigma to even being charged with Kiddy Porn or conspiring with a terrorist. So that's a serious question that's posted by running Grub.
Re:Will Grub take off or be smashed? by Feztaa · 2003-04-19 16:41 · Score: 1

the news came out that they'd been sitting on a potential root vulnerability for a long time

Do you have any references? Please back up your claims.

I like the anecdote, "Gee, this closed source thing turned out to be a huge risk! I'll stay open source, thanks.", but I'd like some proof :)
Re:Will Grub take off or be smashed? by bcrowell · 2003-04-19 16:48 · Score: 4, Informative

Do you have any references? Please back up your claims.
here, and here
Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.

--
Find free books.
Re:Will Grub take off or be smashed? by dtfinch · 2003-04-19 16:52 · Score: 5, Interesting

There are many ways to look at this. The idea is to install the client, set Opera to use the same useragent string, visit some of those sites, then blame it on Grub if the FBI comes busting through your door.

If you're a criminal, installing the Grub client might be a great idea.
Re:Will Grub take off or be smashed? by Moonwick · 2003-04-19 16:55 · Score: 2, Insightful

Yeah, god forbid you help a commercial organization, especially when the results could stand to benefit you.

God knows that Google, by virtue of being a commercial entity, has absolutely nothing to offer you.

Anti-capitalist fucktard.

--
Only on slashdot can a posting be rated "Score -1, Insightful".
Re:Will Grub take off or be smashed? by Logopop · 2003-04-19 20:58 · Score: 1

Some good, valid concerns there. My first concern was of a more practical nature - will the servers take the load when there is a Slashdot jump in the number of clients? My newly downloded client is already spending a lot of time trying to deliver results.
I like the concept nevertheless. My perpective on things has become quite 'googlified' lately, I must admit. So I will be using the web-based search client for an alternative view on my searches. However, I am still unsure how much I will be using the client. There's nothing wrong in contributing to a commercial venture, as long as I am (in this case) allowed to use the service for free. But, as already mentioned, there may be legal questions that need addressing.
Re:Will Grub take off or be smashed? by Jugalator · 2003-04-19 21:19 · Score: 2, Interesting

There is not even any potential reward such as with distributed.net.

How about improving existing search engines with more accurate databases? Commercial organizations like Google might be involved and that's another matter. There might still be a reward to the public.

--
Beware: In C++, your friends can see your privates!
Re:Will Grub take off or be smashed? by wirde · 2003-04-19 22:31 · Score: 1

Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.
Technically you are right. But:
1. On many Windows installations, it's more or less equivalent.
2. Under *nix, running arbitrary code as a user is a good first step to excalating to root.

--
in GNUin GNUin GNUin GNUin GNUin GNUin GNUin GNUSegmentation fault
Re:Will Grub take off or be smashed? by Negatyfus · 2003-04-20 01:39 · Score: 1

On the other hand, if you're innocent and Grub accessed some of that illegal content, try to convince the jury that you didn't abuse Grub to cover up some of your illegal activities like this or that terrorist turned out to have done.
Re:Will Grub take off or be smashed? by stinky+wizzleteats · 2003-04-20 02:44 · Score: 1

If you're a criminal, installing the Grub client might be a great idea.

This is exactly the kind of "barrel full of wine, spoonfull of sewage" argument that is going to get the Internet itself banned before too long.

With things like Freenet running around, and now this (what will happen if these guys get together), the argument will be "Information terrorists have made it impossible to control the Internet. It must, for the sake of the children, therefore be banned."

Tinfoil hat karma whoring? You be the judge. I do a lot of expert witness testimony and general defense consultation on criminal cases involving information technology. I have sat across the table from types who would make Agent Smith look like Barney Fife. I promise you, when this stuff gets on their radar, it will be in the next Patriot Act.
Re:Will Grub take off or be smashed? by joshdaymont · 2003-04-20 03:28 · Score: 1

You raise some good questions, but there are even more. What about the security concerns? Commercial firms are famous for writing bad code. Also, there are clear privacy dangers here. I for one would never run this on my desktop Josh Daymont MobileSecure, Inc. http://www.mobile-secure.com/
Re:Will Grub take off or be smashed? by Beliskner · 2003-04-20 03:56 · Score: 1

This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server.
What the fuck happened to the First Amendment? Rights that you aren't willing to die for will disappear. If you exect the Feds to censor the Internet and track your URLs then they will.
It's ironic that we attack Al-Qaeda's tactics when our constitution itself demands that we be willing to die for our Rights under the Constitution, unless the Constitution disappeared overnight and I missed a memo.

--
A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
Re:Will Grub take off or be smashed? by smagruder · 2003-04-20 04:02 · Score: 1

Just download version 3.08 to fix it.

--
Steve Magruder, Metro Foodist
Re:Will Grub take off or be smashed? by Kevin+Stevens · 2003-04-20 10:36 · Score: 1

I hear this argument alot, but it makes me think- Do you really plan on analyzing all the code on your computer that is open source? Do you even rely on the fact that someone else will? If someone opened the source to Windows tommorrow, could you really count on people to scrutinize all 10 million lines of code? Even if someone does, you have to just rely on their expertise in finding any potential bugs in it. This auditing process is to me not a whole lot better or more thorough than what goes on inside MS's offices, or SETI's labs.
Re:Will Grub take off or be smashed? by iamhassi · 2003-04-21 08:15 · Score: 1

"...then blame it on Grub if the FBI comes busting through your door."
Great idea! So after you're arrested for kiddy porn and your picture is on the front page of the local paper and the local nightly news, your friends and family disown you, and after a year in jail you finally go to trial to be found not guilty because your $10,000 lawyer argues it was really Grub and the FBI *finally* releases.
Swell plan you have there.

--
my karma will be here long after I'm gone
Re:Will Grub take off or be smashed? by PhilHibbs · 2003-04-24 19:51 · Score: 1

Yes, because 3.08 fixed the last bug.
Re:Will Grub take off or be smashed? by hermes4293 · 2003-04-30 08:12 · Score: 1

there are illegal websites?
tell me one!

Great idea, but will it pan out? by dtolton · 2003-04-19 14:17 · Score: 5, Insightful

LookSmart hopes to tap the altruistic nature of many Internet users.

That unfortunately seems like a naively optimistic hope. While the
vast majority of people may be altruistic, it only takes a few
unscrupulous individuals to completely undermine a fair result.

It's interesting that this idea is an extension to Google's model in
many ways. Essentially Google is able to index so much of the
interent by having 50,000+ servers. I don't think that's what makes
Google such a useful search tool, rather I think it's accuracy and
relevancy. If my search results started getting poluted with bogus
hits, I would stop using it almost immediately.

Unfortunately, by letting people run the client on their machine and
having it send the results back to the server, I think spoofed
results are inevitable. I don't think it will be possible to
safeguard the results either, it will be interesting to see how well
this project survives *when* people start spoofing results. It's
been a problem for SETI@home, and it's something that undermined some
peoples faith in the project as a whole. If the spoofed results are
more widespread and have a larger impact as they would in a system
like this, it may ultimately prove fatal to the project.

One factor that has been asbolutely critical to Google's success has
been their ability to remain resistant to spoofing attempts. It's
still a question mark how well grub will perform in that context.

--

Doug Tolton

"The destruction of a value which is, will not bring value to that which isn't." -John Galt

Re:Great idea, but will it pan out? by Nickilo · 2003-04-19 18:42 · Score: 5, Interesting

"The General's Dilemma" would solve this problem. The story goes something like this: The general needs to get urgent information to one of his officers, however, he suspects saboteurs are present among his messengers. In order to insure the information gets through accurately, he sends the same message with several men. The officer on the other end collects all the messages and goes with the majority. (And, presumably, kills the others.)
Re:Great idea, but will it pan out? by npongratz · 2003-04-20 18:36 · Score: 1

Possibly not. The officer would probably have trouble unless the messengers come to him with a verifiably accurate timestamp of the message they're delivering (ie, the Grub server instructs n clients to fetch a page at the exact same time and return the results with the timestamp).

Why? Well, given the dynamic nature of the Internet, pages change through the course of time (the General updates his messages often). So even a difference of one second can change the results of the fetching of a given page. Thus, we get the illusion of saboteurs in our camp (along with the nasty requisite beheadings) even though the messengers probably are legitimate (ie, no conveniently "touched up" results are returned, yet the returned pages have changed from one unit of time to the next).

Of course, adding a timestamp alone wouldn't solve the problem, either. There'd be issues with time syncronization due to network latency, timestamp spoofing, etc. I would guess a well-thought-out public key infrastructure would have to be implemented (for secure retransmission of the timestamp), which opens another can of worms.
Re:Great idea, but will it pan out? by aminorex · 2003-04-21 02:54 · Score: 1

I agree that it's not an unsolvable problem, however,
it's a bit more complex than you paint it: Dynamic
content can provide different results on every access.
What does the officer do if every messenger gives
a different result?

--
-I like my women like I like my tea: green-

Biiig questions to answer by andy@petdance.com · 2003-04-19 14:20 · Score: 5, Interesting

So Grub goes out, uses bandwidth, and then returns some results to the home base. It's really distributed bandwidth more than distributed computation.

I bet one of the big successes in Folding and distributed.net is that many people run the clients on work boxes, knowing that there's little actual overhead incurred to their work. How different that is for a URL sucker.

I wonder what broadband ISPs think of Grub.

Re:Biiig questions to answer by fatalist23 · 2003-04-19 14:46 · Score: 1

Well, as a college student on a line with a bandwidth quota (per week capped, not too bad) I can say that I'm not too enthusiastic about donating my bandwidth. The application itself probably wouldn't be too traffic intensive, but given my bandwidth usage habits, I know I run quite close to the caps (which could cause me to get kicked off the network) quite often. Just my .02
Re:Biiig questions to answer by friedegg · 2003-04-19 14:49 · Score: 4, Interesting

I wonder what broadband ISPs think of Grub.

If it becomes a problem, I imagine ISPs will declare it a commercial bandwidth usage, and order users to stop or move to a business class plan for more money.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
Re:Biiig questions to answer by Zork+the+Almighty · 2003-04-19 17:44 · Score: 1

As a college student myself, I'm more concerned about redirecting bandwidth AWAY from destroying the RIAA.

--

In Soviet America the banks rob you!
Re:Biiig questions to answer by einer · 2003-04-20 12:38 · Score: 1

Which in my mind is just another reason that someone should take this idea, and implement an open source version.

How hard could it be? :)

Haiku :-) by Ignorant+Aardvark · 2003-04-19 14:20 · Score: 4, Funny

Grub searches the web
Sniffing out all the good porn
Not just bootloader

I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called "E-Coli" yet? No? I can just imagine my mom ...

"Agh! You have E-Coli on your computer!"

--
Cyde Weys Musings - Scrutinizing the inscrutable

Re:Haiku :-) by Anonymous Coward · 2003-04-19 14:48 · Score: 3, Funny

How about 'SARS'? Four letters, indicates something that spreads quickly...
Re:Haiku :-) by Anonymous Coward · 2003-04-19 15:01 · Score: 4, Funny

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
I'm wondering if the Grub bootloader developers will throw a tantrum and flood the Grub crawler developers' e-mail addresses, claiming that this will confuse people and harm the bootloader project.

Hee hee.
Re:Haiku :-) by Unoriginal+Nick · 2003-04-19 15:09 · Score: 5, Funny

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
How about Firebird? I'm sure that won't cause any problems :-)
Re:Haiku :-) by Anonymous Coward · 2003-04-19 15:24 · Score: 1, Funny

Hmmm yes. maybe they should have used A SEARCH ENGINE before deciding on Grub. Currently the GNU GRUB is the first result on google.
Re:Haiku :-) by Chester+K · 2003-04-19 16:19 · Score: 4, Funny

As time approaches infinity, the number of software projects named Firebird also approaches infinity.

It's ok though because they'll all still be different projects, so nobody will get confused.

--

NO CARRIER
Re:Haiku :-) by certron · 2003-04-19 19:03 · Score: 1

"Seriously though, shouldn't they change the name? GRUB is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called E-Coli yet?"

I think, if anything, they should call it Grubi or Grubbi. On one hand, it could be cute, and could probably have a good mascot and backronym for it, and on the other, it indexes anything it can get its grubby little hands/ fingers/ tentacles/ protrubances on. Sounds like a good name to me. :-)

I'm sure some bio person will tell you all about e-coli and how usually it isn't harmful. Or something. I'll let them tell about it, even if it is unrelated to a name.

--

fair.org counterpunch.com truthout.com indymedia.org salon.com
eff.org guerrilla.net debian.org gentoo.org
Re:Haiku :-) by Anonymous Coward · 2003-04-19 23:34 · Score: 1, Insightful

I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!

OK. 15 minutes are up, and we are STILL waiting for your "Good" joke.
Re:Haiku :-) by iomud · 2003-04-20 00:43 · Score: 1

Now that was funny.
Re:Haiku :-) by rowanxmas · 2003-04-20 12:39 · Score: 1

I am going to stand by my choice of LILO for the new name of this software.

Business Plan? by Anonymous Coward · 2003-04-19 14:22 · Score: 2, Insightful

What are sensible business plans for this type of endeavour?

Should we expect to see many commercial efforts focussed on providing similar "crawl" or "index" capabilities, but each honed to a specific niche market? A scientific crawler? A retail links database?

One could argue that similar efforts targeting music resources have resorted to less automated techniques, i.e. human-driven sharing.

Thoughts?

Re:Business Plan? by ddimas · 2003-04-19 23:43 · Score: 1

First explain to me why I should donate my resources to your profit?

I think that they're just trying to avoid paying for hardware. No thanks, they can make money without my stuff.

Hrmm, I wonder how long... by bergeron76 · 2003-04-19 14:22 · Score: 3, Insightful

until someone figures out a way to compromize their local client's results and "escalate" their fave URLS.

It still sounds like a really cool idea though.

--
Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.

Re:Hrmm, I wonder how long... by CaptainMunchies · 2003-04-19 14:38 · Score: 3, Insightful

Grub's clients don'tcome up with a ranking for each website they crawl; rather, they check to see if this website has changed since the last time it was crawled. For any website that has changed, the client notifies the server. The search engine asks the server which sites in its index need to be updated, and the server gleefully replies.

Clients artificially increasing their ranking isn't an issue, since the client has nothing to do with a site's ranking.

--
Spam removed for the Internet's pleasure ...

grub is already taken by stock · 2003-04-19 14:23 · Score: 2, Insightful

Grub is the GRand Unified Bootloader, a GNU project, so the name is already taken.

Hmm searchengine eh? Why don't you call it grab ?

Robert

Re:grub is already taken by Concerned+Onlooker · 2003-04-19 16:23 · Score: 1

So is Grab. It's a screen capture app that comes with OS X. Maybe their lawyers wouldn't mind sharing....

--
http://www.rootstrikers.org/
Re:grub is already taken by mackstann · 2003-04-19 18:23 · Score: 1

What's the deal with names lately? Who cares!
I don't see Phoenix being used for BIOS and a browser as a problem, I don't see Firebird being used for a database and a browser as a problem, and I don't see grub the bootloader and grub the web spider conflicting. They're entirely different products, and there are only so many words out there. Here is one of a million examples of a name that is taken by tons of different companies.
Re:grub is already taken by knowledgepeacewi · 2003-04-19 20:59 · Score: 1

I don't see Phoenix being used for BIOS and a browser as a problem
Yeah, but the legal system might. No one is as anal as a lawyer is about words and wording. And since Judges are lawyers...
Re:grub is already taken by stesch · 2003-04-20 07:24 · Score: 1

Grub is the GRand Unified Bootloader, a GNU project, so the name is already taken.
Does anybody see the humor in this? They haven't used a search engine to check the name ...

If previous results are any guide by carl67lp · 2003-04-19 14:23 · Score: 5, Funny

1. Tech-savvy people will install this.
2. Tech-savvy people tend to be loners.
3. Loners most often search for porn.

C1. Tech-savvy people search for porn.

4. Items searched for most often reach the top of the list.
5. Porn is searched for often by tech-savvy people.

C2. Porn will be easier to find with this new search engine.

Count me in!

Re:If previous results are any guide by KoolDude · 2003-04-19 16:11 · Score: 1

1. Tech-savvy people will install this.
2. Tech-savvy people tend to be loners.
3. Loners most often search for porn.

C1. Tech-savvy people search for porn.

4. Items searched for most often reach the top of the list.
5. Porn is searched for often by tech-savvy people.

C2. Porn will be easier to find with this new search engine.

6. pr0nit !?!

--
getSexySig(); /* returns sexy signature */
Re:If previous results are any guide by anon*127.0.0.1 · 2003-04-19 16:43 · Score: 4, Funny

You're having trouble finding porn now?

--
I am NOT a man!
I am a free number!
Re:If previous results are any guide by Saeger · 2003-04-19 21:55 · Score: 1

People still search for porn on the IntarWeb instead of p2p? Amazing.
--

--
Power to the Peaceful

great news! API? by The-Perl-CD-Bookshel · 2003-04-19 14:24 · Score: 2, Interesting

This is going to challenge Google's search, which will entice them to cut loose some of those really cool google labs concepts. Froogle, Google News, and all of the other cool things that they are working on are great services and are going to be the focus of innovation over at Google.

Also, Looksmart needs to develop and release an API for this system. You can only use the google api for 2,000 searches per. day. If they allowed unlimited usage, it would get a lot of developer backing.

--
I don't keep a lid on my coffee so when I walk around I look busy -me

Not news for us webmasters by Gothmolly · 2003-04-19 14:27 · Score: 1, Insightful

grub has been crawling my site for weeks if not months now. How is this news? Because someone at Wired wrote about it? Geesh.

--
I want to delete my account but Slashdot doesn't allow it.

Re:Not news for us webmasters by Redwing · 2003-04-19 14:52 · Score: 5, Interesting

Here is what slashdotters were saying about grub almost 2 years ago.

--
Raisinettes are my raison d'etre
Re:Not news for us webmasters by commodoresloat · 2003-04-19 16:36 · Score: 1

How is this news? Because someone at Wired wrote about it?
No; because someone at Wired News wrote about it.
Re:Not news for us webmasters by hswerdfe · 2003-04-19 19:11 · Score: 2, Insightful

dude, get over yourself....

I never heard tell of Grub.org before.

I found it interesting....

not every link on slashdot is going to directly relate to you....

--
--meh--

Grub by squiggleslash · 2003-04-19 14:27 · Score: 3, Funny

Ok, so how are they going to store this giant search engine in the boot sector of an ordinary hard drive?

Oh wait, you mean it's not related to GRUB, the Linux/etc boot loader. *slaps forehead* But I guess this solves everything - we can call Phoenix "Grub" too, and just treat it as the generic name to call everything we're having problems thinking up a name for...

--
You are not alone. This is not normal. None of this is normal.

Firewalls? by adam_megacz · 2003-04-19 14:28 · Score: 5, Insightful

So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

Re:Firewalls? by friedegg · 2003-04-19 14:40 · Score: 3, Informative

You can always put an entry in your robots.txt to block it.

Actually, the robots.txt issue is one they're still working on. Right now it doesn't check the file very often, which upsets some webmasters.

They're open to suggestions, so maybe you could suggest a list of blacklisted IP's/hostnames. I suggested they look into supporting gzip compressed web pages, and they said they'd look into it.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
Re:Firewalls? by GigsVT · 2003-04-19 14:42 · Score: 2, Interesting

If you knowingly run a program that openly spies on every page you go to, you get what you deserve.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:Firewalls? by adam_megacz · 2003-04-19 14:57 · Score: 1

I don't run the webserver in question.

Also, what if the inept secretary down the hall (who has no idea what robots.txt is) decides to run this thing?
Re:Firewalls? by friedegg · 2003-04-19 15:04 · Score: 2, Informative

Well, if you're getting into "What if"'s, she could could also email someone outside the company anything from inside the firewall. Or setup a file sharing client like Kazaa and share things on local and network drives.

If you wanted to forbid the client from working, network admins could block port 3136 (I think it is), which would prohibit communication with the central server.

My understanding is that grub does not just crawl away randomly, rather it's given a list of things to crawl by the central server. So, assuming it hasn't crawled your intranet before, and you don't give it a local site to crawl, it shouldn't normally find them. But, like I said, they're open to suggestions, so if you have some, offer them.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
Re:Firewalls? by CableModemSniper · 2003-04-19 15:13 · Score: 1

well since you don't know that robots.txt is on the webserver anyway, I'm sure it won't be a problem that the secretary doesn't know this ;)

--
Why not fork?
Re:Firewalls? by YoungHack · 2003-04-19 16:19 · Score: 1

So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?
You don't, and the spider regularly indexes things on 127.0.0.1. There are an awful lot of domains out there that resolve to that. That's why I don't run the spider.
Re:Firewalls? by apsyrtes · 2003-04-19 16:47 · Score: 1

how does "Knowingly?" enter into it?

You know... we have a lot of sensitive stuff on our company intranet. And there are *way* more staff than our network/computer systems admins can ever expect to handle.

And some of them read slashdot.

(of course, I mean the *users* not the *admins*) 8(

I can't wait to see my salary floating around on some Looksmart results page.
Re:Firewalls? by GigsVT · 2003-04-20 03:13 · Score: 1

Knowingly as opposed to spyware that tries to trick you into installing something that spies on you.

This thing's stated purpose is spying on what pages you go to.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:Firewalls? by sakshale · 2003-04-20 09:32 · Score: 1

They're open to suggestions, so maybe you could suggest a list of blacklisted IP's
One restriction - no private IP numbers - URL listings for a host at 10.100.200.1 would not be very useful.

--
For every problem there is a solution that is simple, obvious and wrong.

Google Toolbar by petree · 2003-04-19 14:30 · Score: 5, Interesting

Couldn't google do this anyways with the google toolbar? Cause with the advanced features version it tracks every page you visit. If they offered some incentive to install the toolbar, google could just beat them at this game. I actually use the google toolbar already by choice (it makes my web searching more productive) everyday, all they have to do is get lots of people using it and wouldn't that work just as well or better?

Re:Google Toolbar by Anonymous Coward · 2003-04-19 14:43 · Score: 1, Interesting

Google Toolbar does have a distributed computing option now (you have to turn it on). I think they're using it for SETI or folding or one of those worthwhile causes. I always assumed the incentive to use the toolbar was the functionality it provides.
Re:Google Toolbar by Kelerain · 2003-04-19 14:54 · Score: 5, Interesting

This tracking is actually how a lot of important information leaks out. Security through obscurity has always been a poor mans system, and this busts it wide open. I wont post them here but there are several interesting searches you can do that give personal results for things that REALLY have NO place on a publicly accessable page. On a more positive note, google already uses distributed computing though thier googlebar http://toolbar.google.com/dc/offerdc.html However they donate the cycles to various worthy causes like folding at home (currently thier only benificiary), but it is concevable that if they came up with some secure and usefull search related thing to do with the cycles they could put it to use almost instantaniously. I think that there aren't segnificant benifits (plenty of discussion elsewhere here) for them to want to use it however.
Re:Google Toolbar by Phroggy · 2003-04-19 15:11 · Score: 1

If they offered some incentive to install the toolbar, google could just beat them at this game.

Does being a kick-ass tool (for those unfortunate enough to be using Internet Explorer) count as incentive?

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Google Toolbar by Gryftir · 2003-04-19 15:46 · Score: 1

Grub appears to have more cross-browser and cross platform (Google Toolbar only runs on Internet Explorer 5 for now.) Grub runs on Linux and windows, and since it isn't a browser plugin, doesn't require you to have a certain browser.

--
http://www.santacruzbynight.com/index.shtml Santa Cruz By Night Vampire Larp
Re:Google Toolbar by James_Duncan8181 · 2003-04-19 21:21 · Score: 1

Please, please, please detail. You can't just dangle something so interesting...

--
"To any truly impartial person, it would be obvious that I am right."
Re:Google Toolbar by Kelerain · 2003-04-20 17:11 · Score: 1

Please, please, please detail. You can't just dangle something so interesting...
Well manily becuse I couldn't think of a good example at the time. I actually saw these before on a slashdot thread which I am unable to locate. But searching for things like "pub pwl" or "directory of" passwd. Things that are obviously insecure. While no one knows about them they are sometimes left open. And the user with his google toolbar will sometimes go there. And then it gets on google. OOPS. There are some better searches out there. Get creative. The point is, if you know what you are doing google is great for finding unsecure systems and private information.

Hardly distributed crawling by Herbst · 2003-04-19 14:41 · Score: 2, Interesting

...rather a crawl with a distributed component.

They use the screensaver grub clients to check if a web page has been modified since the last time it was crawled (by the centralized crawl done by Looksmart). They probably use some smart MD5 checksum of the pages and send that with the urls to be crawled to the clients. If the checksum of what the grub client crawled doesn't match then the centralized crawl is instructed to re-fetch that url.

They go this route because the If-Modified-Since HTTP 1.1 request is not supported by many webservers (and even if it is, you can't really trust it). This is especially true for dynamically generated web pages. I.e., if If-Modified-Since would work reliably then it would be a simple operation to check if a previously crawled page has changed. Since that's not the case, they are outsourcing the expensive refetching of whole pages.

It will be interesting to see how this pans out. I think they could run into trouble with ISPs if this really takes off (because bandwidth consumption per user would increase and make flatrate deals less profitable for some ISPs).

Re:Hardly distributed crawling by myov · 2003-04-19 15:19 · Score: 2, Insightful

Not the greatest way of doing this. On one of the sites I maintain, the date shows up at the top of the page. The other content changes very infrequently in most cases (a few pages hit a news&events database but that's about it). But the new date would be enough to change the checksum (unless they're allowing for it somehow)

Grub hits us quite often. I've seen the same URL hit multiple times in one day by different hosts. It's ignoring the "revisit-after" meta tag (7 days), but then, so are most of the other search engines. While I haven't banned it, I am watching the amount of bandwidth it uses.

--
I use Macs to up my productivity, so up yours Microsoft!
Re:Hardly distributed crawling by Herbst · 2003-04-19 16:46 · Score: 1

Not the greatest way of doing this. On one of the sites I maintain, the date shows up at the top of the page. The other content changes very infrequently in most cases (a few pages hit a news&events database but that's about it). But the new date would be enough to change the checksum (unless they're allowing for it somehow)
That's why I mentioned "smart" MD5 Checksums. You'd only checksum certain parts of a page. E.g., detecting everything that looks like a date and make sure that that's not part of the smart checksum. As long as the checksum parser on the grub client and the one at Looksmart are identical, that should work pretty well.

The Distributed Search Engine by deadfishhotmail.com · 2003-04-19 14:42 · Score: 2, Interesting

It's kind of funny and a bit ironic that search engines are generally used to search information from a central repository and Grub uses a distributed network to index pages. It's almost like having a distributed google cache (that's updated more frequently). Perhaps a better idea would be to invent a crawling daemon that runs on each server with a standard protocol that reports to a central server the relevence of search terms (hey it's DNS for search terms!!) - to bad it would be heavily abused (mostly by Buy Now, Free Money and Pron avenues I suppose).

Ok now tell me that it's already been done, 'cause I'm pretty sure it has (and probably by Microsoft for ad money).

Well it's an idea that might be more efficient and updatable than Grub anyway.

--

Who is this "Poster" guy and why does he own all of my comments?!?

Google's technology is superior... by eidechse · 2003-04-19 14:42 · Score: 4, Funny

...those pigeons can't be beat.

Re:Google's technology is superior... by Dannon · 2003-04-19 15:35 · Score: 1

Indeed. In every contest between pidgeons and grubs to date, the pidgeons have clearly had the upper beak.

--
Good judgment comes from experience.
Experience comes from bad judgment.
Re:Google's technology is superior... by eidechse · 2003-04-19 18:25 · Score: 1

You should probably take a look at that backstory link above.
Re:Google's technology is superior... by Boss,+Pointy+Haired · 2003-04-19 20:20 · Score: 1

Or does anybody else not find this pigeon rank thing that funny?

I think it's pretty lame myself.

But whenever someone mentions or links to pigeon rank around here it gets +4/5 funny every time.

My Take on Grub by Anonymous Coward · 2003-04-19 14:44 · Score: 2, Informative

Looksmart is only using Grub to save on their bandwidth. Essentially Grub just compresses web pages before sending them to Looksmart's indexer thus reducing the bandwidth they have to pay for by a factor of 5 or so. The same thing could be accomplished through a proxy which compresses web pages. Eventually, once the HTTP mime standard for requesting compressed web pages is better supported by web servers, Grub will not be necessary.

What about the RIAA? by One+Louder · 2003-04-19 14:51 · Score: 3, Insightful

So...let's say my instance of Grub crawls over a repository of .mp3s and supplies that information to the combined index.

What's the difference between my machine indexing them and the university students recently being hauled into court for indexing open shares? Why would I not be held liable for contributory copyright infringement?

No thanks.

Re:What about the RIAA? by Anonymous Coward · 2003-04-19 14:54 · Score: 1, Insightful

Because this would call into question the future of all search engines, and you'd see the big plays like Google, Yahoo, Overture, etc head into court with their own high priced lawyers. You think the RIAA wants a fight it doesn't think it can win?
Re:What about the RIAA? by SmartGamer · 2003-04-19 15:46 · Score: 2, Interesting

Here's the catch: it's going for scare tactics.

The Church of Scientology has already threatened Google and gotten results moved; I can, in all honesty, see the RIAA going for it.

It would be an earthshattering case, but here's the thing: the RIAA stands a disturbingly good chance of winning.

I hope, I pray they don't were they to try it- and try they most certainly will, because they think they can get money out of the lawsuit and they want money. That's very likely a major motive.

Oh, and to mods-for-a-day: mod the parent of this post up. It's thoroughly underrated at zero.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.
Re:What about the RIAA? by SmartGamer · 2003-04-19 15:49 · Score: 1

Difference: You can show that you don't have direct control over it, and it is likely that they'd go for Grub instead of the users. ...other than that, not much. Note that I think the RIAA is full of excrement on their recent case as well.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.
Re:What about the RIAA? by knowledgepeacewi · 2003-04-19 21:12 · Score: 1

the RIAA stands a disturbingly good chance of winning.
Even if the server containing the MP3s is in a country that doesn't recognize copyrights?

I would think displaying a link to copyrighted material would fall under free speech as long as you don't supply the material itself. But IANAL and the RIAA has a lot of money to blow.
Re:What about the RIAA? by Saeger · 2003-04-19 22:12 · Score: 1

So your friend was dumb enough to not use robots.txt and to leave insecure Directory Indexes enabled, but smart(ass) enough to redirect his newfound visitors to funny pages? cute.
--

--
Power to the Peaceful

They realize they aren't the REAL GRUB by anagama · 2003-04-19 14:55 · Score: 5, Informative

From the readme in the linux version - no idea what the other readmes might say. However, it appears that they are sensitve to the fact that bootloader grub pre-existed their program. They are requesting catchy names. Here is an excerpt:

Notice
======
The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

--
What changed under Obama? Nothing Good

Re:They realize they aren't the REAL GRUB by RighteousFunby · 2003-04-19 16:54 · Score: 1

I have some ideas for a new name... parasite leech bloodsucker bigbrother or, even better windowsxp

--
If you're happy and you know it read my blog
Re:They realize they aren't the REAL GRUB by JWSmythe · 2003-04-19 19:47 · Score: 1

I dare say Pontiac had the name first. The 1967 Pontiac Firebird was the first.. :) I'm a big Firebird fan. I've had many F-Bodies from the 1975 Camaro LT-1 to the 2000 Firebird TransAm WS/6.

Honestly, it's going to be hard to come up with any name that someone, in some way, thinks they already have claims to..

But, to keep this completely on topic, it seems the grubclient has problems.. It works fine on a Slackware 8.1 workstation, but bombs out with a segfault after a few minutes on a Slackware 8.0 machine..

Too bad for them. The Slack 8.0 machine is on a 1Gb/s connection. The Slack 8.1 machine is on a suck-ass Charter Cablemodem..

I got Charter Communications's junkmail in today for bribes on upgrading my bandwidth. For only an extra $80/mo they'll increase my upload to 128k (from 24k), and my download to 512k (from like 128k).. This is a *FAR* cry from what all the cablemodem providers were claiming when they started. if I remember right, they were advertising 3Mb down, 1Mb up... Now I may as well be on a dialup if I'm uploading.

Cablemodem providers suck ass.. I'm contemplating getting my own T1 loop to my office. :)

--
Serious? Seriousness is well above my pay grade.
Re:They realize they aren't the REAL GRUB by Saeger · 2003-04-19 22:37 · Score: 2, Interesting

Oh please! There's 6+ billion people on the planet now, and not enough unique namespace for everyone or every business to have that one 'cool' short name, so why they don't do what us humans have done? GET A LAST NAME.
Grub The SearchEngine
Grub The Bootloader
FireBird von Browser
FireBird von Database
Gentoo el Distro
Gentoo el FileManager
Apple Computer
Apple Records

I'm serious. Nobody should feel entitled to an exclusive piece of namespace just because they think they had it first or are bigger & badder and more deserving than some newbie treading on their turf. (trademark `this!')
--

--
Power to the Peaceful
Re:They realize they aren't the REAL GRUB by Redglare · 2003-04-20 06:27 · Score: 1

suggested names: *chump *prey *wget *lookSmartSucker *thornleysFolly

Google crawls a lot, actually by bigberk · 2003-04-19 14:57 · Score: 1

It seems that google is actually crawling my site a lot more than grub is. Over the past 6 days:

$ grep -c Googlebot access_log
827
$ grep -c grub-client access_log
153

Re:Google crawls a lot, actually by oaf357 · 2003-04-19 16:29 · Score: 1

That's not a very good representation. Google has been going through its deep crawl the past 6 days.
Re:Google crawls a lot, actually by bigberk · 2003-04-19 17:04 · Score: 1

Google has been going through its deep crawl the past 6 days.

Oh, ok... the numbers I was seeing did seem weird :)

A better use for my screensaver time by Call+Me+Black+Cloud · 2003-04-19 14:57 · Score: 5, Insightful

I prefer grid.org to grub.org. There the cycles are going to cancer or smallpox research. Currently over 2 million machines are participating.

Altruism has its place, but since I'm more likely to die of cancer than of not having the complete www indexed I think I'll be selfish and work towards a cure for something that may affect me.

Re:A better use for my screensaver time by BigZaphod · 2003-04-19 15:20 · Score: 1

Anything like this for MacOS X? I checked the system requirements on grid.org and it seems to be windows only.

--
Hexy - a strategy game for iPhone/iPod Touch
Re:A better use for my screensaver time by pointwood · 2003-04-20 06:41 · Score: 1

I would suggest Distributed Folding instead. At least they got good clients and clients for more than just Windows ;)

curious. by toothfish · 2003-04-19 14:58 · Score: 2

i wonder if google has already seen this coming (i've seen that grub fellow in my logs a number of times and sort of wondered about it), and is going to use their own distributed search engine once they get the bugs hammered out...

Oh, just great. by TrebleJunkie · 2003-04-19 15:01 · Score: 1

*Another* bunch of spiders chewing up my bandwidth, ignoring my robots.txt files, and bringing my server(s) to their knees.

Joy of freaking joys.

--

Ed R.Zahurak

You know, oblivion keeps looking better every day.

Re:Oh, just great. by iggymanz · 2003-04-19 15:51 · Score: 1

I've got hits from grub from 57 different addresses in the last month. So there's certainly no coordination among the clients. It's a WASTE of web server bandwidth. I also don't appreciate bots that claim it will come back to the robots.txt file later after crawling through denied pages and wasting even more bandwidth.

Indexor or Search Engine? by digitect · 2003-04-19 15:02 · Score: 4, Interesting

I expected some way to search... this looks more like a project to index the web rather than make the results available for public use via web interface. Did it strike anyone else odd that there was no web form on the home page with which to search?!

It seems like a good concept, but the availability of the information collected needs to be accessible without installing the client. I'm not game to install distributed computing apps without some freely available benefit. The "for the good of the world" motivation went out the window for me about a day after my first Seti At Home experience. (But now BitTorrent, there was appreciable benefit. I had RedHat 9 isos within 8 hours of their initial release!)

--
There is no need to use a SlashDot sig for SEO...

Re:Indexor or Search Engine? by LetterJ · 2003-04-20 03:03 · Score: 1

Is nobody looking at anything other than the linked page? There's a "Tools" page that has not only a link to a search box that uses the results, but to their XML API for working with the engine.

--

The Glass is Too Big: My Take on Things

Re:search.msn.com is the future by shibbydude · 2003-04-19 15:04 · Score: 5, Interesting

In particular, the company has its own team of editors that monitors the most popular searches being performed and then hand-picks sites that are believed to be the most relevant.

You have to be kidding or working for Microsoft, or both! Have you ever searched for Linux on MSN? Try it - here.

Notice the third result? "Learn about the Microsoft alternatives and how to move to them from open source products." I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

--
We're only gonna die from our own arrogance, that's why we might as well take our time...

Small Thing by Qacker · 2003-04-19 15:06 · Score: 1

Hmmm what is my login again?...

Set Up Your Account Please register for your Grub account. We will NOT release your personal information to anyone, and your email address will not be displayed on the site. Your email address will be your Grub login.

* Email:

* Username:

* New Password:

--
Learn lisp today!

blah by jafac · 2003-04-19 15:06 · Score: 1

just another extension of the 1998 zeitgeist;
It's all about eyeballs.

baloney.

Show me the profits.

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

You can run both by friedegg · 2003-04-19 15:08 · Score: 3, Informative

Grub isn't a heavy cpu users. Right now, on my Athlon (~2400+), it's using between 0-2% of the CPU at any given time. Grub is mainly interested in your excess bandwidth.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.

Re:You can run both by rabidcow · 2003-04-19 15:45 · Score: 5, Funny

Grub is mainly interested in your excess bandwidth.

Unfortunately, so is my ISP. In fact, they've already sold it to other customers.
Re:You can run both by smagruder · 2003-04-20 04:11 · Score: 1

Grub is mainly interested in your excess bandwidth.

And, I would suppose, the excess bandwidth of many web hosting packages. I do not want Grub hitting my hosted sites from all these disparate IP's just to build a new search engine we don't need. To prevent a possible DOS due to running out of purchased bandwidth, I'm going to have to write site code that denies site access to the Grub clients. I can make do with the fact that my sites already have decent listings on Google and dmoz.

--
Steve Magruder, Metro Foodist
Re:You can run both by smagruder · 2003-04-20 04:42 · Score: 1

It's not complicated to alter a small bit of my common PHP code that blocks out particular user agents. Besides, it's being reported that Grub doesn't necessarily adhere to robots.txt instructions.

--
Steve Magruder, Metro Foodist

Re:Search engine software and lack of A . I . by Anonymous Coward · 2003-04-19 15:16 · Score: 1, Informative

Google is very responsive to spam reports. Rather than simply remove spam sites tas they find them, they prefer to "teach" their software what's bad from example. This can take a bit of extra time, but it seems worth it to me. Google even has a link on their search results for feedback if you're unhappy. Try reporting bad searches some time.

Re:Search engine software and lack of A . I . by adamruck · 2003-04-19 15:20 · Score: 1

cough

--
Selling software wont make you money, selling a service will.

Phew... by WetCat · 2003-04-19 15:23 · Score: 1

An enormous amount of spiders that are hunting for an enormous amount of web flies - pages...

actually.. by SystematicPsycho · 2003-04-19 15:25 · Score: 1

they're going to sneak in file sharing support with a kazaa plugin.

--
Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold

Looksmart by Ark42 · 2003-04-19 15:27 · Score: 3, Interesting

Isn't Looksmart/Sprinks a big pay-per-listing deal? The looksmart logo in the upper right corner was enough to make me just close that page right away without any second thought.

--
Morphing Software

Re:Search engine software and lack of A . I . by zymano · 2003-04-19 15:31 · Score: 3, Insightful

I didn't know that.

But it still kind of irks me that people think that a computerized 'dumb' search result could compete with a human rating system that filters spam,porn,and other garbage results. Google should hire some REAL PEOPLE that can do some sort catagorized intelligent directory so we can have QUALITY at the beginning of a search result. Some sort of HUMUN RATING system is needed to sort. The software is not up to par.

Lame.. by Anonymous Coward · 2003-04-19 15:33 · Score: 1, Insightful

Grub has had problems forever. I remember when they first announced it. It sounded cool, so I went to check it out. Turns out the actual crawling was done by.. wait for it.. wget. How lame is a web crawler that uses wget?

Then people started to realize that grub didn't have a good set of AI back at the mothership--lots of pages got crawled way too often, grub didn't obey robots.txt, etc. Many webmasters just started banning grub altogether.

Now we find out that LookSmart has bought grub and its three developers. LookSmart is the company that stabbed its customers in the back by starting to charge for every click from its directory instead of a one-time fee for inclusion.

These two groups deserve each other. Grub was supported by the community, but now that they've sold out to commercial interests, who wants to give up their bandwidth for free to LookSmart? The grub code was GPL--I wonder if grub will start to change the license to make the code closed source..

Re:search.msn.com is the future by velkro · 2003-04-19 15:35 · Score: 2, Funny

Not to mention:

Results 1-15 of about 609 containing "linux"

I seem to remember there being more than 609 websites with Linux information on them...

Re:search.msn.com is the future by inertia187 · 2003-04-19 15:36 · Score: 1

So, pray tell, where does that result belong? I agree, it shouldn't be number three, but where then? It's nowhere to be found in the first ten pages of Google. Am I to assume does not Google weight search results? No, just look at the Search King case. I don't think we can really rely on any search engine with an agenda, but we have no other choice.

--
A programmer is a machine for converting coffee into code.

Flood Control by SmartGamer · 2003-04-19 15:43 · Score: 2, Interesting

According to the Grub FAQ, it respects robots.txt although not the META tags. Although it takes a week or two for it to listen to the robots.txt, it does eventually...

The sheer volume of this project concerns me, however. The very fact that it got Slashdotted may cause it to be a bit heavier than expected!

It sounds like a good use of spare bandwidth, but if it's going to wind up a superscanner, it's going to send a hell of a lot of requests.

I tried it and deleted it as quickly: it's not very good at being a bottom feeder, it redlined my system resources immediately and slowed everything down. Duration between installation and uninstallation: twenty-nine seconds.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.

Web searching will only get harder... by Sancho · 2003-04-19 15:44 · Score: 2, Insightful

...as the web gets larger and more cluttered.

I've already discovered this with comic books turned into movies. Finding synopses of the comic book X-Men is nigh impossible. Finding syopses of the movie s is much, much easier. Damn near every site online about X-Men, Spiderman, The Hulk, Batman, etc. deal with the movies, and sifting through the cruft is not easy. And that's just comic books. Other topics can be just as hard to find, and this doesn't even touch upon fake search results that only turn up porn or worse, a blank page (happens frequently).

Searching for MORE stuff isn't going to help. Searching better is the key. Google goes a long way towards this, but even it has the same problems of finding too much crud.

Re:Web searching will only get harder... by mattwolfewvu · 2003-04-19 16:31 · Score: 1

Yes mods, this is offtopic, I'm just kindly replying to the parent post. This (www.marveldirectory.com) is a site I found about a month ago. Nothing too in-depth but fun to poke around in.

--
"I think that when you become a Republican, you don't get to score any more." -- Butt-head
Re:Web searching will only get harder... by wheany · 2003-04-19 20:41 · Score: 1

I found out the same thing when I wanted to know what Bullseye (from Daredevil) looked like in the comics.
Re:Web searching will only get harder... by PhxBlue · 2003-04-20 03:41 · Score: 1

Actually, Google goes further than you think. You just have to know how to search.

--
!#@%*)anks for hanging up the phone, dear.
Re:Web searching will only get harder... by ktorn · 2003-04-22 04:51 · Score: 1

I totally agree with you. We already have speed and quantity (i.e. google) what we need now is quality.

Some have a point in saying google provides much more than the simple search, but it still falls short of what I think could be done. Not that they (google) don't know how to do it, but they rather keep it fast, and you won't get fast AND quality at the same time.

So perhaps what we need is something to complement google. A slow, heavy_meta_data search engine that you can use to make complex queries.

For example, I want to get all the pages that contain the term "Eclipse" as a link text AND within a
(list) element, with at least 5 'incoming' links from distinct servers. And I should be able to provide it with a list of 'related' URLs (i.e. sun.java.com, developer.com) to push up the related context.

A further step still, would be to tick a 'use synonyms' box, and the search engine would automatically search all the combinations of synonyms of each keyword. This is why I said, you can't have it fast.

I seriously thought about distributed indexing back in 1999, and I'm glad I never implemented it. Some of the comments relating Grub are very good (i.e. prone to be poluted by tweaked clients). I'm now working on something related though. A framework for subject-specific web directories (like, mini-yahoos that anyone can produce), in open-source java. When I get it working it'll appear at jsite.org. These mini-directories would then share an API that could be combined into a single front-end (that's where megamap comes in). Still not pollution free, but the indexing clients are now a very select few.

Altruistic? by sulli · 2003-04-19 15:44 · Score: 5, Funny

That's the dumbest thing I've heard in ages. Why should I help out a for-profit company for free?

(Oh, I can't remember. Have I MetaModerated Recently?)

--

sulli
RTFJ.

Re:Altruistic? by eversunsoft · 2003-04-19 18:36 · Score: 4, Insightful

Well, because web searching, to this day in age, has been a free service. Supposing that the index is built as the result of donated searches, it would be ethically in very bad taste to act against this trend.
Of course, I am the first one to question this trend. Has anyone else considered the possibility that one day we'll wake up, and notice that google is charging for access to it's basic searching services?
I for one, would probably pay. I have become so dependent on it. What price? That's a good question...
Re:Altruistic? by johnburton · 2003-04-19 20:24 · Score: 1

Well why not? Is it better that your resources sit there idle helping nobody at all to do anything?

--
Sig is taking a break!
Re:Altruistic? by R0 · 2003-04-19 22:19 · Score: 5, Funny

Notice
====== The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

I nominate "parasite".
Re:Altruistic? by exhilaration · 2003-04-20 04:09 · Score: 1

I nominate Phoenix!
Re:Altruistic? by MikeDX · 2003-04-20 04:50 · Score: 1

PIKACHU I choose YOU!
Re:Altruistic? by stesch · 2003-04-20 07:19 · Score: 1

Firebird seems to be a cool name.
Re:Altruistic? by dirvish · 2003-04-21 19:19 · Score: 1

Slashdot is a for profit company and you just helped them out by providing free (quality?) content.

--
FoundNews.com - get paid to blog.,

And in related news . . . by ubernostrum · 2003-04-19 15:44 · Score: 1, Redundant

The architects of the GRand Unified Bootloader posted to the mozillazine forums today, flaming the choice of the name "grub" for this new system and calling for spamming of all grub-related discussion boards in retaliation.

Or not. What a difference maturity makes.

How about picking the types of content to crawl? by joejoejoejoe · 2003-04-19 15:48 · Score: 1

I saw another poster say you can stop the GRUB client from crawling porn, but what if you could pick the types of content you wanted to crawl for?

Let's say for example I use search engines but find them lacking or would like better results for the types of content I SEARCH FOR???

So one solution would just be to pick the types of content manually, or select keywords, etc, manually....
Another option might be to sniff my use of Google.com or Altavista.com (is that still up? ;) and then help the Engines refine the content in its indexes according to what I ACTUALLY SEARCHED FOR???

Since there is not any monetary incentive to run the client, and you won't find any Aliens (but maybe some freaks ;), give the user (client) the ability to improve results for things that matter to them....

--
Silly Rabbit: tricks are for kids.

Good Idea, Bad Implementation by oaf357 · 2003-04-19 15:52 · Score: 3, Insightful

Yea. If you help Grub, Grub gives your web site a preferencial listing. Building the biggest search engine, sure. Building good search results, not so sure.

Re:Good Idea, Bad Implementation by Anonymous Coward · 2003-04-19 15:55 · Score: 2, Insightful

It doesn't give you a preference in listings, simply a preference in crawling. You offer some work to guarantee your site has fresh indexing. It's not much different than the search engines that sell frequent crawling for extra. A fresh non-relevant listing won't help you much more than an older listing.

Alternate idea by gmuslera · 2003-04-19 15:53 · Score: 1

Why not a proxy with a component that is a node of a distributed search engine?

Something that the i.e. squid cache, and is some kind of client of that kind of network will be more useful, at least for common users (the ones that don't have yet a proxy cache will gain a lot in internet navigation, and will not use extra bandwidth, it will use just what they already downloaded) and for the "search" engine will give another approach of ranked results, giving more results for the sites that are more accessed, not just the ones that are more linked.

It could have problems, of course. Sites not very visited will not be easy to found, making them even more difficult to find, but maybe this can be compensated with an optional crawler.

What _is_ a good project? by bcrowell · 2003-04-19 16:11 · Score: 3, Interesting

I have a FreeBSD server that wastes the vast majority of its CPU cycles (and most of its bandwidth, too). So what is a good distributed computing project to donate those cycles to? I'd like to find something that

makes me feel warm and fuzzy about my altruism
can run in the background on a Unix box
is open-source (so I don't have to run someone's closed-source app on my box and trust their security through obscurity)

Well, #1 rules out Grub, #2 rules out Folding@Home, and #3 rules out both SETI@Home and Folding@Home.

So what worthy causes are out there?

--
Find free books.

Re:What _is_ a good project? by valkraider · 2003-04-19 17:13 · Score: 1

Distributed.net
Re:What _is_ a good project? by metlin · 2003-04-19 19:43 · Score: 2, Interesting

How about helping with some cool math prime search?

ars Team Prime Rib - cool prime searching stuff.

A mix of misc science stuff.

dc projects - some Opensource, some not.

And all projects at distributed.net come with source too.
Re:What _is_ a good project? by denny_d · 2003-04-20 02:15 · Score: 1

I've been asking the same thing lately... the 'cancer' project, last I cked, didn't have a linux client... maybe it's time to come up with a distributed app. that anwers the question, "Why are the rich getting richer, the poorer getting poorer, and why do so few seem to care?"
Don't mind my bleeding heart.
Re:What _is_ a good project? by smagruder · 2003-04-20 04:25 · Score: 1

I'm doing SETI@home anyway. SETI is a trusted provider, and I'm not letting my strong devotion to OS get the best of me. SETI has made crystal clear their rationale behind closing their source, and I accept it.

--
Steve Magruder, Metro Foodist
Re:What _is_ a good project? by shfted! · 2003-04-20 10:39 · Score: 1

Read Rich Dad, Poor Dad for the answer to your question. Highly recommended.

--
He who laughs last is stuck in a time dilation bubble.

DDoS by karlm · 2003-04-19 16:14 · Score: 3, Interesting

So the idea is to DDoS the entire web? :-)

If this thing gets too popular without proper throttling, they could cause real havoc.

--
Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.

Legalities? by cheshiremackat · 2003-04-19 16:17 · Score: 4, Interesting

Alright, I have 3 major problems with this...

1) How different is this than the princton kiddies system? I don't know about you, but I don't want a 95 billion dollar bill arriving in the mail...

2) What if you local (cache?) contains a few links to kiddie porn? Not your fault, right? Software does it's own thing, you cannot control, BUT what will the FBI think? The FBI Scottland Yard, RCMP are currently heavily investigating Kiddie Porn cases (good work IMHO), but what if your the unlucky sap who getts stuck with a few sketchy URLs? Or Worse Yet, what if this GRUB keeps a cache of the website like google does? Then what?

3) What about material that is legal locally, but illegial somewhere else... eg. Nazi stuff in Germany, Falun Gong in China, etc... The last thing I want is to be refused to be given a travel visa cuz my PC has an illegial cache...

Good idea in principle, but with sketchy content on the web, I don't think I will be the one keeping track of it all. If there is a way to filter out the questionable stuff then maybe, but since the purpose is to be as inclusive as possible, it seems incompatible.

_CMK

--
Bad spellers of the world untie!

Re:Legalities? by Anonymous Coward · 2003-04-19 16:36 · Score: 1, Informative

A. I don't believe it caches anything except crc's for the url's. It downloads it, calculates the CRC, sees if it's updated, and it's gone. And, B. It doesn't download images or other media files, so no kiddie porn, unless it's text.
Re:Legalities? by cheshiremackat · 2003-04-19 17:50 · Score: 1

text is still illegal...

And I don't want to point to any copywritten material... DMCA!

--
Bad spellers of the world untie!
Re:Legalities? by SmartGamer · 2003-04-19 17:51 · Score: 2, Interesting

It does, however, download a buffer of URLS to scan. If your buffer was less than clean when your computer gets searched, oops, you're in trouble...

Not to mention the fact that it still goes and hits all those sites, and with the government trying to smash that little thing we call "privacy," anything questionable will likely go on your permanent record- the one that doesn't exist, but they somehow have anyway.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.
Re:Legalities? by amoe · 2003-04-19 22:58 · Score: 2, Interesting

text is still illegal...

Text child pornography is illegal? How does that work? I thought the rationale for video child porn being illegal was that an illegal act had been committed in its creation - how do they justify making something illegal that is purely the product of an author's imagination?

Disclaimer: I have never read a child porn story, but I have seen them around the seedier places on the net.

--
You look beautiful! Incidentally, my favourite artist is Picasso.
Re:Legalities? by gozar · 2003-04-20 08:22 · Score: 1

At least in Ohio you can be jailed for text child porn.

--
What, me worry?
Re:Legalities? by cheshiremackat · 2003-04-20 15:49 · Score: 1

Yeah... after your name is published in the New York Times as posessing child porn... remember the court of public opinion is a very scary place...

Richard Jewel anyone?

_CMK

--
Bad spellers of the world untie!
Re:Legalities? by turkeyphant · 2003-05-03 02:22 · Score: 1

Disclaimer: I have never read a child porn story, but I have seen them around the seedier places on the net.

You realise this only affirms your guilt, right?

Too bad you don't live in Ohio...

--
Turkeyphant

Unlimited Use? Try Wishful Thinking. by NeoMoose · 2003-04-19 16:37 · Score: 3, Insightful

You can always use the Google API for more than 2,000 searches per day if you pay licensing fees for it. That's just Google ensuring that it can remain a viable company. Little text-box advertisements just don't cut it in this day and age where blatant pop-ups and colorful banner ads don't even have much turn-around. That's not the point though.

The point is that I wouldn't look anytime soon for LookSmart to allow unlimited usage of this API. It's too large of a project for them to just let people use it. It's simple economics. They may not be investing the computing resources into this projects web spidering software, but it's still using TONS of resources to keep this data catalogued and readily accessible.

The open faucet, not the blown dam by SmartGamer · 2003-04-19 16:47 · Score: 2, Informative

A DDoS is only effective because it's a whole bunch of messages all at once to one target- in the 100,000,000 range for a full-scale attack, to always cover all the positions.

The database of "check-me"s is randomized rather evenly. Even if this takes off, I don't see how it could really do serious damage to any but the truly dinky servers: the hits will not come in all at once and flood the whole connection. While it very well could end up a constant stream, it's unlikely to be the massive stream that makes a DDoS.

It does have the potential to slow servers across the world, but that's okay- it will slow home users' connections across the world by using 1/4 of them, too, so nobody will actually notice.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.

Re:The open faucet, not the blown dam by smagruder · 2003-04-20 04:28 · Score: 1

A DDoS is only effective because it's a whole bunch of messages all at once to one target...

Well, no. Many hosted web sites have bandwidth limits entailed in the packages. If Grub makes the bandwidth limits tip over, then that's an effective DOS.

--
Steve Magruder, Metro Foodist

the backstory by eidechse · 2003-04-19 16:48 · Score: 1

google's pigeons

Re:search.msn.com is the future by lamber45 · 2003-04-19 16:50 · Score: 2, Interesting

I followed one of these links and looked at the MSDN article. It's full of generalizations taken from 20-year-old UNIX textbooks, although Linux and X windows are mentioned here and there. Apparently recent versions of some level of Windows have an "Interix" subsystem. I've used Cygwin32 on Win95, WinME, Win2k and WinNT, and Borland C++, and Visual C++ .NET, but I don't think I've ever used the Microsoft native POSIX layer. The article gives a lot of questions that should be asked before starting a migration like this. One possible reason to migrate is to decrease the Total Cost of Ownership; another is to increase hardware options and move away from proprietary systems!

Another quote I like is, "Windows operating systems do not provide X Windows. For X Windows connectivity, developers need a third-party X Windows server.". Of course Microsoft would never be anticompetitive by competing with third-party suppliers of implementations of an open standard, right?

Re:search.msn.com is the future by Anonymous Coward · 2003-04-19 16:50 · Score: 2, Insightful

It's not as bad as you make it out to be. They do point out (in fine print) that it is a "featured" site. They list the "featured" sites first, then the sponsored links, and then general web hits. And they mark each category. I guess that the only differencebetween featured and sponsored is in the price. All this was far from obvious to me when I saw the results at first (being used to Google), but I imagine that if you used them on a daily basis you would quickly become used to skipping down to the real results.

Ah, just what we need by Moonwick · 2003-04-19 16:51 · Score: 1

Another damn web spider adding to the collective noise of the internet.

Why don't these people try to work out some way of sharing information so I don't have to have my webserver poked at by every person and their brother's search engine?

--
Only on slashdot can a posting be rated "Score -1, Insightful".

Read the fine print by anon*127.0.0.1 · 2003-04-19 16:52 · Score: 2, Insightful

It's a "featured site". Meaning it's a site from Microsoft, a Microsoft partner, or someone who paid some money to Microsoft for the privilege.

Nothing that other search sites don't do. They just mark their paid adverts a little more obviously.

--
I am NOT a man!
I am a free number!

What about the source? by PhrostyMcByte · 2003-04-19 16:55 · Score: 1

Okay, i found the source at sourceforge CVS. unfortunately, all the files checked in are >4 months old. If this is under the GPL, where the hell is the source for the binaries they are putting out?

Re:search.msn.com is the future by resin8 · 2003-04-19 17:16 · Score: 1

Results 801 - 878 of about 58,500,000
In order to show you the most relevant results, we have omitted some entries very similar to the 878 already displayed.

609 pages with Linux info isn't so bad, when you consider Google only shows 878 "relevant pages". Not one link to MSN in those 878 pages.
Anyone care to look through the 58,499,222 omitted entries?

Re:Unlimited Use? Try Wishful Thinking. by dmoynihan · 2003-04-19 18:19 · Score: 1

Little text-box advertisements just don't cut it in this day and age where blatant pop-ups and colorful banner ads don't even have much turn-around.

This I dispute sir. Targeted keywords on google, where my clickthrough ratio has averaged 1.3-1.5%, are a goldmine for my site and money very well-spent (averaging $500 a month on those ads, paying .05 in 97% of all cases.)

I've been a google advertiser since Feb. 02, consider their program extremely lucrative, and I guess they like me 'cause I got a picture frame from them last Christmas. It was a Coach picture frame....

Re:Unlimited Use? Try Wishful Thinking. by NeoMoose · 2003-04-19 18:27 · Score: 1

I'm not disputing whether or not the advertising is effective in fulfilling its purpose of promoting the advertiser's site. I am simply stating that Google would not a very viable company if they relied on advertising alone to make their money.

I won't argue with you on how much Google makes off the ads, as I am willing to bet that about 80% or more of their funds comes from advertising, however, advertising has always proven as an ineffective means of remaining viable. You simply have to have other sources of income.

The Web as a Catalog by X-wes · 2003-04-19 18:38 · Score: 1

GNU Grub bootloader, who's executable is named "grub".

Im sure you'r apostrophe's and ",quotes", have good grammars

Re:search.msn.com is the future by The+Cydonian · 2003-04-19 19:00 · Score: 1

I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

Yes, Google's algo only asked Microsoft to go to hell, of course, taking it down after the story was reported far and wide.

--
More than mere navel gazing.

The approach is inherently flawed by oren · 2003-04-19 19:22 · Score: 3, Interesting

It is too easy to send currupted information into the database. They have *no choice* but to trust the clients. Sure they could run spot checks on the results, but they would be very partial and it would be easy enough to fake responses for those as well.

So the more popular it gets, the more incentive people will have to promote their sites by feeding it fake index information. If this magically got to be very popular, within weeks search results would become meaningelss and it would drop back into obscurity. The more likely result would be that it will never become popular in the first place.

Besides, who wants to donate his CPU and bandwidth resources for a commercial company, anyway?

Re:The approach is inherently flawed by UnknownQ · 2003-04-20 00:40 · Score: 1

It is too easy to send currupted information into the database. They have *no choice* but to trust the clients.

Not really, if they follow the typical distributed computing model they give you a chunk of the web, and the chances that out of the whole web they give you part you are interested in tweaking is very low. The only reason to mess with results is out of pure malice.

Also it would be pretty easy to put a report link url if cnn.com is only links to joe blow's web site. With any luck they aren't doing searches on a link based algorithm anyway.

--
Wherever you go, there you are!

The internet has become, by nycheetah · 2003-04-19 19:44 · Score: 1

The internet has become, an ever growing tree of knowledge that will some lead to something even bigger.

Old & Rusty by sICE · 2003-04-19 19:47 · Score: 1

nothing about grub here, but personally i really like this web site that have a few search engines on it: http://freddo.netfirms.com/. It also refers to Fravia's new website and his invaluable forum.

A good reference about search engines is also Search Engine Watch

have fun...

--
-- search the web

Just terrific. A massively powerful DDOS tool. by NerveGas · 2003-04-19 19:55 · Score: 1

Normally, most search engine's spidering methods are designed to be pretty nice to servers - such as only requesting pages once every 30 seconds or so.

However, I've seen times when the methods of some of the search engine spiders were foiled by such simple things as having a large number of virtual hosts on a machine. Combine that with a number of front-end machines all connected to the same database server, and things can get really nasty.

In one particularly bad incident, several fairly big-name search engines were spidering us simultaneously, and only hitting each domain name relatively infrequently. However, with 500+ on several front-end servers, and several search engines, we were getting something like 50-100 requests per *second* from the search engines. When those hits were to pages generated from the database, our servers kept up, but performance was definitely degraded.

So, where am I going? I see the potential for small bugs, weak algorithms, idiotic end-users, or even malicious end-users causing the same sort of havoc. Even if it weren't meant as an actual DDOS, it could certainly end up that way. And it would be much, much harder to prevent than merely blocking (or rate-limitting) requests from one company's spiders.

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

The have cracked it by fireman+sam · 2003-04-19 20:02 · Score: 2, Funny

1. Design a search engine
2. Let everyone else fill it
3. Profit

The second step is finally found!!! YAY

--
it is only after a long journey that you know the strength of the horse.

Grub does NOT look for robots.txt by MythosTraecer · 2003-04-19 20:06 · Score: 1

I'm sure grub will indeed build a larger database than most other search engines, since grub (or grub-client, or whatever it's calling itself) has never, not even once bothered to look at a robots.txt file on any web site I've ever administered. This is what webmasters call a misbehaved robot, and it is not something to be looked at with respect.

--

--Mythos

Re:Grub does NOT look for robots.txt by Anonymous Coward · 2003-04-20 01:44 · Score: 3, Informative

Here it is on mine requesting it: 64.241.242.18 - - [18/Mar/2003:17:25:30 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.242.18 - - [19/Mar/2003:19:41:05 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.243.81 - - [30/Mar/2003:22:10:41 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.243.81 - - [01/Apr/2003:23:11:21 -0700] "GET /robots.txt HTTP/1.1" 200 223 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" Notice those are LookSmart owned ip's and not just normal user crawlers. They seem to centrally crawl for robots.txt. They do know, however, that they need to crawl for robots.txt more often.
Re:Grub does NOT look for robots.txt by Kentrosaurus · 2003-04-20 05:17 · Score: 1

I haven't seen them hit my robots.txt in the last logrotate term, but it's been on the disallow / for at least a month and I'm still flooded by their mindless drones.

CPU cycles are NOT wasted or "available" by pe1chl · 2003-04-19 20:52 · Score: 2, Insightful

The common point made by these "distributed" software authors is that there are "wasted" CPU cycles in your computer that you could donate to a project for free.
However, that is not true at all! CPU cycles are not wasted. When the CPU has nothing to do, it sleeps. At least in a modern operating system (i.e. about everything after Windows 95).

By "donating your wasted CPU cycles" you will actually increase the power consumption of your computer. This will be very noticable in a laptop, but when you watch the CPU temperature in your home system you will also see a noticable increase in temperature between an idle system and a system running a computationally intensive background task.

Probably the effect will be worse for things like keysearches, prime number searches, SETI etc than for this GRUB bot, because that probably also spends time waiting for the network (and thus returns the CPU to idle).

So before you "donate your wasted CPU cycles", please realize that this will actually cost you money.

Re:CPU cycles are NOT wasted or "available" by The_Big_Red_Dog · 2003-04-21 09:25 · Score: 1

But it isn't so much CPU cycles with Grub as much as it is bandwidth. Many users don't understand that many of them share bandwidth and don't really have extra to spare.

My first Grub hit coming over to my site by presroi · 2003-04-19 20:55 · Score: 1

$IP - - [05/Apr/2002:12:27:55 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 404 218 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)"

So, this was last year.... Is this a dupe?

Re:My first Grub hit coming over to my site by caluml · 2003-04-19 22:40 · Score: 1

So, this was last year.
Warning, your system clock is 1 year out of date.
[root@presroi.de root]# ntpdate ntp.demon.co.uk 20 Apr 11:39:09 ntpdate[23473]: adjust time server 158.152.1.76 offset 1284989826352.108067 sec
Thankyou ;)

--
Get your own free personal location tracker
Re:My first Grub hit coming over to my site by presroi · 2003-04-20 00:16 · Score: 1

My system clock is *not* 1 year out of date.

this is a grep over my mylogfile.txt for 'grup.org' from Feb 2002 to Feb 2003.

natlb4.webmailer.de - - [05/Apr/2002:12:27:55 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 404 218 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" 192.67.198.230 - - [14/Apr/2002:21:05:36 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb4.webmailer.de - - [19/Apr/2002:06:49:57 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb7.webmailer.de - - [23/Apr/2002:02:15:47 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" 192.67.198.227 - - [27/Apr/2002:04:46:29 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" 192.67.198.228 - - [02/May/2002:15:19:01 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb8.webmailer.de - - [03/May/2002:21:16:14 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb4.webmailer.de - - [13/May/2002:16:31:32 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb5.webmailer.de - - [22/May/2002:09:57:57 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb7.webmailer.de - - [30/May/2002:05:48:19 +0200] "GET /methoden/hanf/ HTTP/1.0" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb4.webmailer.de - - [17/Jun/2002:21:09:39 +0200] "GET /methoden/hanf/robots.txt HTTP/1.0" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb4.webmailer.de - - [01/Jul/2002:19:19:34 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb3.webmailer.de - - [12/Jul/2002:01:07:25 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51766 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" 192.67.198.231 - - [28/Jul/2002:15:33:37 +0200] "GET /methoden/hanf/robots.txt HTTP/1.1" 200 23 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb4.webmailer.de - - [29/Jul/2002:15:19:33 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" 192.67.198.227 - - [14/Aug/2002:18:22:12 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb7.webmailer.de - - [31/Aug/2002:01:26:59 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb8.webmailer.de - - [14/Sep/2002:07:46:27 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)" natlb7.webmailer.de - - [29/Sep/2002:01:39:25 +0200] "GET /methoden/hanf/ HTTP/1.1" 200 51836 "-" "Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with h
Re:My first Grub hit coming over to my site by caluml · 2003-04-20 02:19 · Score: 1

Lol, relax guy, I was making a joke :)

--
Get your own free personal location tracker

Re:How about picking the types of content to crawl by wheany · 2003-04-19 20:56 · Score: 1

I like the idea. But it shuld benefit the community as well, so it should crawl something like 80% community assigned pages, and 20% "my" pages. That would still benefit the user much more than he deserves.

Sig...Tony Blair by knowledgepeacewi · 2003-04-19 21:06 · Score: 1

Holy fuck Tony Blair, what the HELL are you doing?
Ensuring that American Dollars and Popular Opinion flow toward Britain. Not to mention military toys and training for British troops.

Brilliant of him to pick the winning side. Now he can reap the rewards for his people.

Re:Whatever by wheany · 2003-04-19 21:10 · Score: 1

And I was ready to try the client, but it wouldn't accept my email-address, because it has a "+" sign in it, and I couldn't find a contact address where I could have reported the problem.

Re:Unlimited Use? Try Wishful Thinking. by dmoynihan · 2003-04-19 21:11 · Score: 1

You're certainly right that every business should have other sources of income (I do worry about my own site's single source). But I think google's raking it in on the click-through ads.

Typically, where I advertise, there are eight or nine other people trying for the same keyword. I've got the green-shifted look despite paying the minimum because I'm allowed to include "free" in my description, but there's usually five people above me, meaning they're paying at least six cents; often as much as .40 cents per click, on keywords that generate around 500,000 impressions a month.

That number really starts to add up when you think of all the web businesses, and all the keywords, and all the searches, and all the clicks, but I guess we won't have a better idea until google files with the SEC prior to their IPO...

One thought, however, is the way google text ads are now showing at places like Metafilter or a number of the PDA news sites. Google's out to score more impressions any way they can... must be worth something to them.

Re:search.msn.com is the future by pafrusurewa · 2003-04-19 21:43 · Score: 2, Funny

The Austrian version of MSN is even better. If you search for Linux, the first two results are WinXP ads on the Microsoft site. And, while you're at it, try searching for google or yahoo. This will produce a popup saying "Why look for a search engine when you've already found one?".

Re:Unlimited Use? Try Wishful Thinking. by NoOneInParticular · 2003-04-19 22:15 · Score: 1

Where do I pay these license fees? The only thing I can find is this.

In any case, a colloborative search engine API using distributed computing might still be a nice thing for not-for-profit purposes. One of the applications I wanted to use this API for was be a plagiarism search for teachers to quickly scan student papers to see if they were simply pulled of the net. This was bombed by the 1000 query limit of Google's api, as to do the search properly would require a few tens of queries for each paper. If you have to check tens of these papers the limit can be reached fairly soon.

For this purpose speed wouldn't be so much of an issue, so maybe a distributed cataloguing (sp) and search system might be something interesting?

Not so much worse than Kazaa by jmping · 2003-04-19 22:28 · Score: 1

When you download Kazaa, you authorize the corporation to utilize any unused processor or disk space -- this doesn't seem that much more dangerous than all those Kazaa users out there. As a non-Kazaa subscriber, I think I will also skip on grub -- I paid for my computing space and power thank you, and I don't plan on just giving it away to all of these corporations looking to further themselves.

--
**When craziness is bliss, 'tis folly to be sane**

But you get to hide your surfing habits by Wee · 2003-04-19 23:25 · Score: 1

Grub gives you something else: they hide your surfing habits.

The only way I'd run grub is on a low-bid DMZ host (like that old P133 I have laying around), with the adult content searching filters disabled. Then I'd let it do whatever it wanted to do as long as it wanted to do it and I'd forget about it. Who cares about the search results? Just use Google like before. They aren't going to make a good search engine anyway.

But if I ever got a subpoena which included information about my web browsing and online history, I could tell the judge that I could't honestly say if that particular bit of outbound traffic was me or that grub thing doing its searching. So as long as I was running it, I'd be free to look at "subversive" literature, pr0n, Arab websites, the Cato Institute's homepage, whatever I wanted. If I got on a list and they tried to PATRIOT ACT me, I'd use grub as my get out of (Ashcroft's mystery) jail free card. Hell, I'd throw grub and freenet on the same box and cover every base.

That's if I was paranoid. And wanted to surf Arab web sites or pr0n. Which I'm not. And I don't. :-)

-B

--

Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

Distributed Crawling From Browsers by txtger · 2003-04-20 01:08 · Score: 2, Interesting

It would be interested to just see a database that is connected to browsers, so that whenever I were to look at a page, the page data would be processed and sent to whatever search engine. Then, those sites that are updated frequently and get a lot of traffic would be more easily searched.

Just a thought.

Re:Distributed Crawling From Browsers by denny_d · 2003-04-20 01:57 · Score: 1

Sounds like a fine RFE for Mozilla. They'd be the ones to do it right without planting some nasty stuff inside. I think I'll go do that now...

Some hacking required... by SharpFang · 2003-04-20 01:21 · Score: 1

Ok, so I'll just hack it a bit, and all my websites will FINALLY make it to #1 in search engines on ANY keyword! Doh, I need to subscribe to a few click-to-pay banner sites...

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2

Re:Search engine software and lack of A . I . by jafiwam · 2003-04-20 01:38 · Score: 1

What? Are you part of the "Yahoo Publicity Spread FUD department strike team" or something?

Here's a hint for ya;

1) Go to Google

2) Click on the Fourth Link from the left in the bar. (The green one that says "Directory" on it.)

3) Enjoy!

Or, if you are particularly patient, just visit http://www.dmoz.org/ directly.

Built by humans, edited by humans, unpaid volunteers that know something and care about the directories they edit. You too can even volunteer to help!

Yahoo sucks PRECICELY BECAUSE they tried to pay people to get sites in their directory, found out they could not keep up, and then started making site owners pay to get in. Obviously, GRUB won't do what you want either, but what you are complaining about lacking already exists.

read the grub forums by denny_d · 2003-04-20 01:54 · Score: 1

The idea is cool and I imagine it won't be long before an org. without links (unverified) to M$, will do the same thing. There's at least a couple of people on the grub forum who are figuring out some of the shadier sides of this code: potential spyware? security hole? And the licensing is vague (no links).
Note the tone of their pitch as well you are participating in a competitive group effort a kin to Seti@home and Distributed Net? I don't think so... caveat emptor.

Diminishing marginal returns by squashed · 2003-04-20 02:29 · Score: 1

Updating a search engine of general web material is an important objective, but there are diminishing marginal returns to immediacy. Google News is an example of a subset of web material -- news sites -- for which immediacy is a more important goal. It's no surprise that Google offers a very fast refresh there. A distributed system that would do that for the entire net is interesting, but not necessarily worthwhile.

searching for porn by maluke · 2003-04-20 02:54 · Score: 1

i don't know about you.
i don't search for porn, it looks more like porn searches for me.

--
Practical Semantic Web Log

So THAT'S What It Is... by suwain_2 · 2003-04-20 03:13 · Score: 1

I've been noticing some hits from my website mentioning something called "grub," but never knew what it was.

For the webmasters out there, this is what the UserAgent string shows up as on my site:

Mozilla/4.0 (compatible; grub-client-1.2.1; Crawl your own stuff with http://grub.org)

(There are variations on the grub-client-1.2.1 version number, so if you for some reason decide to search, you may want to do grub-client-*.

--
________________________________________________
suwain_2 :: quality slashdot p

Re:So THAT'S What It Is... by Anonymous Coward · 2003-04-20 04:42 · Score: 1, Insightful

Was it really so hard to go to the url in the user agent to see what it was?

Gurb does not follow robots.txt correctly! by sharph · 2003-04-20 06:00 · Score: 1

Why is it looking for robots.txt in a subdirectory?

Re:Gurb does not follow robots.txt correctly! by presroi · 2003-04-20 07:58 · Score: 1

/methoden/hanf equals www.hanfbroschuere.de

the posted IP is not the IP of the grub client but of the somewhat strange ISP tool :)

Re:search.msn.com is the future by muzthe42nd · 2003-04-20 06:18 · Score: 1

i couldn't believe that, so i tried. While my german isn't that great, i worked it out. that is hilarious

--
Pfft - Sorry, what?

hair is raising on the back of my neck by malia8888 · 2003-04-20 07:35 · Score: 2, Interesting

Uh huh, Grub is going to "run in the background" ?
No thanks!!. It just doesn't feel right. It is sort of like lending a firearm to an untrustworthy neighbor. What is in it for the lender other than potential problems?

Spyware "runs in the background" and slows up peoples machines. What really happens to one's machine performance with Grub? And, more importantly, where is my check?

--
Harpo Tunnel Syndrome--my wrist feels funny.

Re:Unlimited Use? Try Wishful Thinking. by NeoMoose · 2003-04-20 16:01 · Score: 1

You can develop any application you want, but you must abide by the Google Web APIs terms of service. One condition is you cannot create a commercial service using Google Web APIs without first obtaining written consent from Google. Another is that you can only create one account for your personal use.

Do what it says - obtain written permission. That written permission will be in the form of a commercial contract/license.

Cool, but I won't get my hopes up by whereiswaldo · 2003-04-20 16:33 · Score: 1

If this becomes popular, legal issues will crop up and it will be shut down and banned through mega-corporations' legal clout. I hope not, but I wouldn't be surprised at all. Today's net kinda sucks.

It's an Important Step by serutan · 2003-04-20 18:59 · Score: 1

So Grub is commercial. Big deal. Any large-scale project like this furthers our knowledge of distributed computing and helps pave the way to other things, like on-demand mirroring of popular content.

Re:Yeah what's the corporation bashing? by hesiod · 2003-04-21 01:29 · Score: 1

> As to whats wrong with Corporations and Big Business: in one word: Enron

Wow, try using a little more thought next time. There is nothing wrong with business. What was wrong with Enron was the people running it (or, not running it). There are plenty of big businesses out there that are not corrupted like Enron. Stop beating your chest with stupid remarks that don't hold up.

Slashdot Mirror

Building a Bigger Search Engine

211 of 278 comments (clear)