Building a Bigger Search Engine

Will Grub take off or be smashed? by Blaine+Hilton · 2003-04-19 14:17 · Score: 4, Insightful

I started to use grub, but then questions started cropping up. First we are using this to further a commercial organization. This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine. There is not even any potential reward such as with distributed.net.

Also the grub engine crawls everything, including adult content and other questionable content. They have a setting to turn it off, but it does not block it. With the current questioning of international law relating to accessing illegal websites this could have major consequences for the average user.

So for the time being I have stopped using the grub client until some serious questions are answered. It's an interesting concept and if it was being used in more of an academic setting it could be interesting. However I believe that search engines like Google are doing pretty good themselves.

Go calculate something

Re:Will Grub take off or be smashed? by bcrowell · 2003-04-19 15:59 · Score: 3, Insightful

This is not research such as SETI or Folding At Home; this is doing the dirty work of a large commercial search engine.
Actually, if I had a gun to my head, I'd choose to run Grub, because the client is open-source. I used to run SETI@home, but then the news came out that they'd been sitting on a potential root vulnerability for a long time. That really brought home to me the risks of running someone else's closed-source app on my box.

--
Find free books.
Re:Will Grub take off or be smashed? by kaden · 2003-04-19 16:00 · Score: 5, Insightful

Um, I think you're missing the point. This client could download highly illegal files, and make it look like I'm knowingly downloading them. Say I run it, and it downloads anything from kiddy porn to some Al Qaida webpage from an FBI sting server. I would quite possibly be arrested and charged, and while I wouldn't be convicted, it's quite an ordeal, and there's an ugly social stigma to even being charged with Kiddy Porn or conspiring with a terrorist. So that's a serious question that's posted by running Grub.
Re:Will Grub take off or be smashed? by bcrowell · 2003-04-19 16:48 · Score: 4, Informative

Do you have any references? Please back up your claims.
here, and here
Actually I think the hole potentially gave the ability to run arbitrary code, which isn't the same as a root vulnerability.

--
Find free books.
Re:Will Grub take off or be smashed? by dtfinch · 2003-04-19 16:52 · Score: 5, Interesting

There are many ways to look at this. The idea is to install the client, set Opera to use the same useragent string, visit some of those sites, then blame it on Grub if the FBI comes busting through your door.

If you're a criminal, installing the Grub client might be a great idea.
Re:Will Grub take off or be smashed? by Moonwick · 2003-04-19 16:55 · Score: 2, Insightful

Yeah, god forbid you help a commercial organization, especially when the results could stand to benefit you.

God knows that Google, by virtue of being a commercial entity, has absolutely nothing to offer you.

Anti-capitalist fucktard.

--
Only on slashdot can a posting be rated "Score -1, Insightful".
Re:Will Grub take off or be smashed? by Jugalator · 2003-04-19 21:19 · Score: 2, Interesting

There is not even any potential reward such as with distributed.net.

How about improving existing search engines with more accurate databases? Commercial organizations like Google might be involved and that's another matter. There might still be a reward to the public.

--
Beware: In C++, your friends can see your privates!

Great idea, but will it pan out? by dtolton · 2003-04-19 14:17 · Score: 5, Insightful

LookSmart hopes to tap the altruistic nature of many Internet users.

That unfortunately seems like a naively optimistic hope. While the
vast majority of people may be altruistic, it only takes a few
unscrupulous individuals to completely undermine a fair result.

It's interesting that this idea is an extension to Google's model in
many ways. Essentially Google is able to index so much of the
interent by having 50,000+ servers. I don't think that's what makes
Google such a useful search tool, rather I think it's accuracy and
relevancy. If my search results started getting poluted with bogus
hits, I would stop using it almost immediately.

Unfortunately, by letting people run the client on their machine and
having it send the results back to the server, I think spoofed
results are inevitable. I don't think it will be possible to
safeguard the results either, it will be interesting to see how well
this project survives *when* people start spoofing results. It's
been a problem for SETI@home, and it's something that undermined some
peoples faith in the project as a whole. If the spoofed results are
more widespread and have a larger impact as they would in a system
like this, it may ultimately prove fatal to the project.

One factor that has been asbolutely critical to Google's success has
been their ability to remain resistant to spoofing attempts. It's
still a question mark how well grub will perform in that context.

--

Doug Tolton

"The destruction of a value which is, will not bring value to that which isn't." -John Galt

Re:Great idea, but will it pan out? by Nickilo · 2003-04-19 18:42 · Score: 5, Interesting

"The General's Dilemma" would solve this problem. The story goes something like this: The general needs to get urgent information to one of his officers, however, he suspects saboteurs are present among his messengers. In order to insure the information gets through accurately, he sends the same message with several men. The officer on the other end collects all the messages and goes with the majority. (And, presumably, kills the others.)

Biiig questions to answer by andy@petdance.com · 2003-04-19 14:20 · Score: 5, Interesting

So Grub goes out, uses bandwidth, and then returns some results to the home base. It's really distributed bandwidth more than distributed computation.

I bet one of the big successes in Folding and distributed.net is that many people run the clients on work boxes, knowing that there's little actual overhead incurred to their work. How different that is for a URL sucker.

I wonder what broadband ISPs think of Grub.

Re:Biiig questions to answer by friedegg · 2003-04-19 14:49 · Score: 4, Interesting

I wonder what broadband ISPs think of Grub.

If it becomes a problem, I imagine ISPs will declare it a commercial bandwidth usage, and order users to stop or move to a business class plan for more money.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.

Haiku :-) by Ignorant+Aardvark · 2003-04-19 14:20 · Score: 4, Funny

Grub searches the web
Sniffing out all the good porn
Not just bootloader

I love being a Slashdot subscriber - it gives me fifteen minutes to figure out a good joke before anyone has a chance to post!

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ... and I have a suggestion. Has anyone written a program called "E-Coli" yet? No? I can just imagine my mom ...

"Agh! You have E-Coli on your computer!"

--
Cyde Weys Musings - Scrutinizing the inscrutable

Re:Haiku :-) by Anonymous Coward · 2003-04-19 14:48 · Score: 3, Funny

How about 'SARS'? Four letters, indicates something that spreads quickly...
Re:Haiku :-) by Anonymous Coward · 2003-04-19 15:01 · Score: 4, Funny

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
I'm wondering if the Grub bootloader developers will throw a tantrum and flood the Grub crawler developers' e-mail addresses, claiming that this will confuse people and harm the bootloader project.

Hee hee.
Re:Haiku :-) by Unoriginal+Nick · 2003-04-19 15:09 · Score: 5, Funny

Seriously though, shouldn't they change the name? "GRUB" is already a bootloader. They should change the name ...
How about Firebird? I'm sure that won't cause any problems :-)
Re:Haiku :-) by Chester+K · 2003-04-19 16:19 · Score: 4, Funny

As time approaches infinity, the number of software projects named Firebird also approaches infinity.

It's ok though because they'll all still be different projects, so nobody will get confused.

--

NO CARRIER

Business Plan? by Anonymous Coward · 2003-04-19 14:22 · Score: 2, Insightful

What are sensible business plans for this type of endeavour?

Should we expect to see many commercial efforts focussed on providing similar "crawl" or "index" capabilities, but each honed to a specific niche market? A scientific crawler? A retail links database?

One could argue that similar efforts targeting music resources have resorted to less automated techniques, i.e. human-driven sharing.

Thoughts?

Hrmm, I wonder how long... by bergeron76 · 2003-04-19 14:22 · Score: 3, Insightful

until someone figures out a way to compromize their local client's results and "escalate" their fave URLS.

It still sounds like a really cool idea though.

--
Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.

Re:Hrmm, I wonder how long... by CaptainMunchies · 2003-04-19 14:38 · Score: 3, Insightful

Grub's clients don'tcome up with a ranking for each website they crawl; rather, they check to see if this website has changed since the last time it was crawled. For any website that has changed, the client notifies the server. The search engine asks the server which sites in its index need to be updated, and the server gleefully replies.

Clients artificially increasing their ranking isn't an issue, since the client has nothing to do with a site's ranking.

--
Spam removed for the Internet's pleasure ...

grub is already taken by stock · 2003-04-19 14:23 · Score: 2, Insightful

Grub is the GRand Unified Bootloader, a GNU project, so the name is already taken.

Hmm searchengine eh? Why don't you call it grab ?

Robert

If previous results are any guide by carl67lp · 2003-04-19 14:23 · Score: 5, Funny

1. Tech-savvy people will install this.
2. Tech-savvy people tend to be loners.
3. Loners most often search for porn.

C1. Tech-savvy people search for porn.

4. Items searched for most often reach the top of the list.
5. Porn is searched for often by tech-savvy people.

C2. Porn will be easier to find with this new search engine.

Count me in!

Re:If previous results are any guide by anon*127.0.0.1 · 2003-04-19 16:43 · Score: 4, Funny

You're having trouble finding porn now?

--
I am NOT a man!
I am a free number!

great news! API? by The-Perl-CD-Bookshel · 2003-04-19 14:24 · Score: 2, Interesting

This is going to challenge Google's search, which will entice them to cut loose some of those really cool google labs concepts. Froogle, Google News, and all of the other cool things that they are working on are great services and are going to be the focus of innovation over at Google.

Also, Looksmart needs to develop and release an API for this system. You can only use the google api for 2,000 searches per. day. If they allowed unlimited usage, it would get a lot of developer backing.

--
I don't keep a lid on my coffee so when I walk around I look busy -me

Grub by squiggleslash · 2003-04-19 14:27 · Score: 3, Funny

Ok, so how are they going to store this giant search engine in the boot sector of an ordinary hard drive?

Oh wait, you mean it's not related to GRUB, the Linux/etc boot loader. *slaps forehead* But I guess this solves everything - we can call Phoenix "Grub" too, and just treat it as the generic name to call everything we're having problems thinking up a name for...

--
You are not alone. This is not normal. None of this is normal.

Firewalls? by adam_megacz · 2003-04-19 14:28 · Score: 5, Insightful

So if I choose to run this client, how do I know that it won't accidentally index content that is only accessible from behind my firewall?

Re:Firewalls? by friedegg · 2003-04-19 14:40 · Score: 3, Informative

You can always put an entry in your robots.txt to block it.

Actually, the robots.txt issue is one they're still working on. Right now it doesn't check the file very often, which upsets some webmasters.

They're open to suggestions, so maybe you could suggest a list of blacklisted IP's/hostnames. I suggested they look into supporting gzip compressed web pages, and they said they'd look into it.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
Re:Firewalls? by GigsVT · 2003-04-19 14:42 · Score: 2, Interesting

If you knowingly run a program that openly spies on every page you go to, you get what you deserve.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:Firewalls? by friedegg · 2003-04-19 15:04 · Score: 2, Informative

Well, if you're getting into "What if"'s, she could could also email someone outside the company anything from inside the firewall. Or setup a file sharing client like Kazaa and share things on local and network drives.

If you wanted to forbid the client from working, network admins could block port 3136 (I think it is), which would prohibit communication with the central server.

My understanding is that grub does not just crawl away randomly, rather it's given a list of things to crawl by the central server. So, assuming it hasn't crawled your intranet before, and you don't give it a local site to crawl, it shouldn't normally find them. But, like I said, they're open to suggestions, so if you have some, offer them.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.

Google Toolbar by petree · 2003-04-19 14:30 · Score: 5, Interesting

Couldn't google do this anyways with the google toolbar? Cause with the advanced features version it tracks every page you visit. If they offered some incentive to install the toolbar, google could just beat them at this game. I actually use the google toolbar already by choice (it makes my web searching more productive) everyday, all they have to do is get lots of people using it and wouldn't that work just as well or better?

Re:Google Toolbar by Kelerain · 2003-04-19 14:54 · Score: 5, Interesting

This tracking is actually how a lot of important information leaks out. Security through obscurity has always been a poor mans system, and this busts it wide open. I wont post them here but there are several interesting searches you can do that give personal results for things that REALLY have NO place on a publicly accessable page. On a more positive note, google already uses distributed computing though thier googlebar http://toolbar.google.com/dc/offerdc.html However they donate the cycles to various worthy causes like folding at home (currently thier only benificiary), but it is concevable that if they came up with some secure and usefull search related thing to do with the cycles they could put it to use almost instantaniously. I think that there aren't segnificant benifits (plenty of discussion elsewhere here) for them to want to use it however.

Hardly distributed crawling by Herbst · 2003-04-19 14:41 · Score: 2, Interesting

...rather a crawl with a distributed component.

They use the screensaver grub clients to check if a web page has been modified since the last time it was crawled (by the centralized crawl done by Looksmart). They probably use some smart MD5 checksum of the pages and send that with the urls to be crawled to the clients. If the checksum of what the grub client crawled doesn't match then the centralized crawl is instructed to re-fetch that url.

They go this route because the If-Modified-Since HTTP 1.1 request is not supported by many webservers (and even if it is, you can't really trust it). This is especially true for dynamically generated web pages. I.e., if If-Modified-Since would work reliably then it would be a simple operation to check if a previously crawled page has changed. Since that's not the case, they are outsourcing the expensive refetching of whole pages.

It will be interesting to see how this pans out. I think they could run into trouble with ISPs if this really takes off (because bandwidth consumption per user would increase and make flatrate deals less profitable for some ISPs).

Re:Hardly distributed crawling by myov · 2003-04-19 15:19 · Score: 2, Insightful

Not the greatest way of doing this. On one of the sites I maintain, the date shows up at the top of the page. The other content changes very infrequently in most cases (a few pages hit a news&events database but that's about it). But the new date would be enough to change the checksum (unless they're allowing for it somehow)

Grub hits us quite often. I've seen the same URL hit multiple times in one day by different hosts. It's ignoring the "revisit-after" meta tag (7 days), but then, so are most of the other search engines. While I haven't banned it, I am watching the amount of bandwidth it uses.

--
I use Macs to up my productivity, so up yours Microsoft!

The Distributed Search Engine by deadfishhotmail.com · 2003-04-19 14:42 · Score: 2, Interesting

It's kind of funny and a bit ironic that search engines are generally used to search information from a central repository and Grub uses a distributed network to index pages. It's almost like having a distributed google cache (that's updated more frequently). Perhaps a better idea would be to invent a crawling daemon that runs on each server with a standard protocol that reports to a central server the relevence of search terms (hey it's DNS for search terms!!) - to bad it would be heavily abused (mostly by Buy Now, Free Money and Pron avenues I suppose).

Ok now tell me that it's already been done, 'cause I'm pretty sure it has (and probably by Microsoft for ad money).

Well it's an idea that might be more efficient and updatable than Grub anyway.

--

Who is this "Poster" guy and why does he own all of my comments?!?

Google's technology is superior... by eidechse · 2003-04-19 14:42 · Score: 4, Funny

...those pigeons can't be beat.

My Take on Grub by Anonymous Coward · 2003-04-19 14:44 · Score: 2, Informative

Looksmart is only using Grub to save on their bandwidth. Essentially Grub just compresses web pages before sending them to Looksmart's indexer thus reducing the bandwidth they have to pay for by a factor of 5 or so. The same thing could be accomplished through a proxy which compresses web pages. Eventually, once the HTTP mime standard for requesting compressed web pages is better supported by web servers, Grub will not be necessary.

What about the RIAA? by One+Louder · 2003-04-19 14:51 · Score: 3, Insightful

So...let's say my instance of Grub crawls over a repository of .mp3s and supplies that information to the combined index.

What's the difference between my machine indexing them and the university students recently being hauled into court for indexing open shares? Why would I not be held liable for contributory copyright infringement?

No thanks.

Re:What about the RIAA? by SmartGamer · 2003-04-19 15:46 · Score: 2, Interesting

Here's the catch: it's going for scare tactics.

The Church of Scientology has already threatened Google and gotten results moved; I can, in all honesty, see the RIAA going for it.

It would be an earthshattering case, but here's the thing: the RIAA stands a disturbingly good chance of winning.

I hope, I pray they don't were they to try it- and try they most certainly will, because they think they can get money out of the lawsuit and they want money. That's very likely a major motive.

Oh, and to mods-for-a-day: mod the parent of this post up. It's thoroughly underrated at zero.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.

Re:Not news for us webmasters by Redwing · 2003-04-19 14:52 · Score: 5, Interesting

Here is what slashdotters were saying about grub almost 2 years ago.

--
Raisinettes are my raison d'etre

They realize they aren't the REAL GRUB by anagama · 2003-04-19 14:55 · Score: 5, Informative

From the readme in the linux version - no idea what the other readmes might say. However, it appears that they are sensitve to the fact that bootloader grub pre-existed their program. They are requesting catchy names. Here is an excerpt:

Notice
======
The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

--
What changed under Obama? Nothing Good

Re:They realize they aren't the REAL GRUB by Saeger · 2003-04-19 22:37 · Score: 2, Interesting

Oh please! There's 6+ billion people on the planet now, and not enough unique namespace for everyone or every business to have that one 'cool' short name, so why they don't do what us humans have done? GET A LAST NAME.
Grub The SearchEngine
Grub The Bootloader
FireBird von Browser
FireBird von Database
Gentoo el Distro
Gentoo el FileManager
Apple Computer
Apple Records

I'm serious. Nobody should feel entitled to an exclusive piece of namespace just because they think they had it first or are bigger & badder and more deserving than some newbie treading on their turf. (trademark `this!')
--

--
Power to the Peaceful

A better use for my screensaver time by Call+Me+Black+Cloud · 2003-04-19 14:57 · Score: 5, Insightful

I prefer grid.org to grub.org. There the cycles are going to cancer or smallpox research. Currently over 2 million machines are participating.

Altruism has its place, but since I'm more likely to die of cancer than of not having the complete www indexed I think I'll be selfish and work towards a cure for something that may affect me.

curious. by toothfish · 2003-04-19 14:58 · Score: 2

i wonder if google has already seen this coming (i've seen that grub fellow in my logs a number of times and sort of wondered about it), and is going to use their own distributed search engine once they get the bugs hammered out...

Indexor or Search Engine? by digitect · 2003-04-19 15:02 · Score: 4, Interesting

I expected some way to search... this looks more like a project to index the web rather than make the results available for public use via web interface. Did it strike anyone else odd that there was no web form on the home page with which to search?!

It seems like a good concept, but the availability of the information collected needs to be accessible without installing the client. I'm not game to install distributed computing apps without some freely available benefit. The "for the good of the world" motivation went out the window for me about a day after my first Seti At Home experience. (But now BitTorrent, there was appreciable benefit. I had RedHat 9 isos within 8 hours of their initial release!)

--
There is no need to use a SlashDot sig for SEO...

Re:search.msn.com is the future by shibbydude · 2003-04-19 15:04 · Score: 5, Interesting

In particular, the company has its own team of editors that monitors the most popular searches being performed and then hand-picks sites that are believed to be the most relevant.

You have to be kidding or working for Microsoft, or both! Have you ever searched for Linux on MSN? Try it - here.

Notice the third result? "Learn about the Microsoft alternatives and how to move to them from open source products." I shit you not! I don't think Google would ever use this kind of dirty, underhanded trick. Great "hand-picking", mate.

--
We're only gonna die from our own arrogance, that's why we might as well take our time...

You can run both by friedegg · 2003-04-19 15:08 · Score: 3, Informative

Grub isn't a heavy cpu users. Right now, on my Athlon (~2400+), it's using between 0-2% of the CPU at any given time. Grub is mainly interested in your excess bandwidth.

--
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.

Re:You can run both by rabidcow · 2003-04-19 15:45 · Score: 5, Funny

Grub is mainly interested in your excess bandwidth.

Unfortunately, so is my ISP. In fact, they've already sold it to other customers.

Looksmart by Ark42 · 2003-04-19 15:27 · Score: 3, Interesting

Isn't Looksmart/Sprinks a big pay-per-listing deal? The looksmart logo in the upper right corner was enough to make me just close that page right away without any second thought.

--
Morphing Software

Re:Search engine software and lack of A . I . by zymano · 2003-04-19 15:31 · Score: 3, Insightful

I didn't know that.

But it still kind of irks me that people think that a computerized 'dumb' search result could compete with a human rating system that filters spam,porn,and other garbage results. Google should hire some REAL PEOPLE that can do some sort catagorized intelligent directory so we can have QUALITY at the beginning of a search result. Some sort of HUMUN RATING system is needed to sort. The software is not up to par.

Re:search.msn.com is the future by velkro · 2003-04-19 15:35 · Score: 2, Funny

Not to mention:

Results 1-15 of about 609 containing "linux"

I seem to remember there being more than 609 websites with Linux information on them...

Flood Control by SmartGamer · 2003-04-19 15:43 · Score: 2, Interesting

According to the Grub FAQ, it respects robots.txt although not the META tags. Although it takes a week or two for it to listen to the robots.txt, it does eventually...

The sheer volume of this project concerns me, however. The very fact that it got Slashdotted may cause it to be a bit heavier than expected!

It sounds like a good use of spare bandwidth, but if it's going to wind up a superscanner, it's going to send a hell of a lot of requests.

I tried it and deleted it as quickly: it's not very good at being a bottom feeder, it redlined my system resources immediately and slowed everything down. Duration between installation and uninstallation: twenty-nine seconds.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.

Web searching will only get harder... by Sancho · 2003-04-19 15:44 · Score: 2, Insightful

...as the web gets larger and more cluttered.

I've already discovered this with comic books turned into movies. Finding synopses of the comic book X-Men is nigh impossible. Finding syopses of the movie s is much, much easier. Damn near every site online about X-Men, Spiderman, The Hulk, Batman, etc. deal with the movies, and sifting through the cruft is not easy. And that's just comic books. Other topics can be just as hard to find, and this doesn't even touch upon fake search results that only turn up porn or worse, a blank page (happens frequently).

Searching for MORE stuff isn't going to help. Searching better is the key. Google goes a long way towards this, but even it has the same problems of finding too much crud.

Altruistic? by sulli · 2003-04-19 15:44 · Score: 5, Funny

That's the dumbest thing I've heard in ages. Why should I help out a for-profit company for free?

(Oh, I can't remember. Have I MetaModerated Recently?)

--

sulli
RTFJ.

Re:Altruistic? by eversunsoft · 2003-04-19 18:36 · Score: 4, Insightful

Well, because web searching, to this day in age, has been a free service. Supposing that the index is built as the result of donated searches, it would be ethically in very bad taste to act against this trend.
Of course, I am the first one to question this trend. Has anyone else considered the possibility that one day we'll wake up, and notice that google is charging for access to it's basic searching services?
I for one, would probably pay. I have become so dependent on it. What price? That's a good question...
Re:Altruistic? by R0 · 2003-04-19 22:19 · Score: 5, Funny

Notice
====== The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.

I nominate "parasite".

Good Idea, Bad Implementation by oaf357 · 2003-04-19 15:52 · Score: 3, Insightful

Yea. If you help Grub, Grub gives your web site a preferencial listing. Building the biggest search engine, sure. Building good search results, not so sure.

Re:Good Idea, Bad Implementation by Anonymous Coward · 2003-04-19 15:55 · Score: 2, Insightful

It doesn't give you a preference in listings, simply a preference in crawling. You offer some work to guarantee your site has fresh indexing. It's not much different than the search engines that sell frequent crawling for extra. A fresh non-relevant listing won't help you much more than an older listing.

What _is_ a good project? by bcrowell · 2003-04-19 16:11 · Score: 3, Interesting

I have a FreeBSD server that wastes the vast majority of its CPU cycles (and most of its bandwidth, too). So what is a good distributed computing project to donate those cycles to? I'd like to find something that

makes me feel warm and fuzzy about my altruism
can run in the background on a Unix box
is open-source (so I don't have to run someone's closed-source app on my box and trust their security through obscurity)

Well, #1 rules out Grub, #2 rules out Folding@Home, and #3 rules out both SETI@Home and Folding@Home.

So what worthy causes are out there?

--
Find free books.

Re:What _is_ a good project? by metlin · 2003-04-19 19:43 · Score: 2, Interesting

How about helping with some cool math prime search?

ars Team Prime Rib - cool prime searching stuff.

A mix of misc science stuff.

dc projects - some Opensource, some not.

And all projects at distributed.net come with source too.

DDoS by karlm · 2003-04-19 16:14 · Score: 3, Interesting

So the idea is to DDoS the entire web? :-)

If this thing gets too popular without proper throttling, they could cause real havoc.

--
Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.

Legalities? by cheshiremackat · 2003-04-19 16:17 · Score: 4, Interesting

Alright, I have 3 major problems with this...

1) How different is this than the princton kiddies system? I don't know about you, but I don't want a 95 billion dollar bill arriving in the mail...

2) What if you local (cache?) contains a few links to kiddie porn? Not your fault, right? Software does it's own thing, you cannot control, BUT what will the FBI think? The FBI Scottland Yard, RCMP are currently heavily investigating Kiddie Porn cases (good work IMHO), but what if your the unlucky sap who getts stuck with a few sketchy URLs? Or Worse Yet, what if this GRUB keeps a cache of the website like google does? Then what?

3) What about material that is legal locally, but illegial somewhere else... eg. Nazi stuff in Germany, Falun Gong in China, etc... The last thing I want is to be refused to be given a travel visa cuz my PC has an illegial cache...

Good idea in principle, but with sketchy content on the web, I don't think I will be the one keeping track of it all. If there is a way to filter out the questionable stuff then maybe, but since the purpose is to be as inclusive as possible, it seems incompatible.

_CMK

--
Bad spellers of the world untie!

Re:Legalities? by SmartGamer · 2003-04-19 17:51 · Score: 2, Interesting

It does, however, download a buffer of URLS to scan. If your buffer was less than clean when your computer gets searched, oops, you're in trouble...

Not to mention the fact that it still goes and hits all those sites, and with the government trying to smash that little thing we call "privacy," anything questionable will likely go on your permanent record- the one that doesn't exist, but they somehow have anyway.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.
Re:Legalities? by amoe · 2003-04-19 22:58 · Score: 2, Interesting

text is still illegal...

Text child pornography is illegal? How does that work? I thought the rationale for video child porn being illegal was that an illegal act had been committed in its creation - how do they justify making something illegal that is purely the product of an author's imagination?

Disclaimer: I have never read a child porn story, but I have seen them around the seedier places on the net.

--
You look beautiful! Incidentally, my favourite artist is Picasso.

Unlimited Use? Try Wishful Thinking. by NeoMoose · 2003-04-19 16:37 · Score: 3, Insightful

You can always use the Google API for more than 2,000 searches per day if you pay licensing fees for it. That's just Google ensuring that it can remain a viable company. Little text-box advertisements just don't cut it in this day and age where blatant pop-ups and colorful banner ads don't even have much turn-around. That's not the point though.

The point is that I wouldn't look anytime soon for LookSmart to allow unlimited usage of this API. It's too large of a project for them to just let people use it. It's simple economics. They may not be investing the computing resources into this projects web spidering software, but it's still using TONS of resources to keep this data catalogued and readily accessible.

The open faucet, not the blown dam by SmartGamer · 2003-04-19 16:47 · Score: 2, Informative

A DDoS is only effective because it's a whole bunch of messages all at once to one target- in the 100,000,000 range for a full-scale attack, to always cover all the positions.

The database of "check-me"s is randomized rather evenly. Even if this takes off, I don't see how it could really do serious damage to any but the truly dinky servers: the hits will not come in all at once and flood the whole connection. While it very well could end up a constant stream, it's unlikely to be the massive stream that makes a DDoS.

It does have the potential to slow servers across the world, but that's okay- it will slow home users' connections across the world by using 1/4 of them, too, so nobody will actually notice.

--
Warning: Poster of this comment is a nerd. Just like everybody else here.

Re:search.msn.com is the future by lamber45 · 2003-04-19 16:50 · Score: 2, Interesting

I followed one of these links and looked at the MSDN article. It's full of generalizations taken from 20-year-old UNIX textbooks, although Linux and X windows are mentioned here and there. Apparently recent versions of some level of Windows have an "Interix" subsystem. I've used Cygwin32 on Win95, WinME, Win2k and WinNT, and Borland C++, and Visual C++ .NET, but I don't think I've ever used the Microsoft native POSIX layer. The article gives a lot of questions that should be asked before starting a migration like this. One possible reason to migrate is to decrease the Total Cost of Ownership; another is to increase hardware options and move away from proprietary systems!

Another quote I like is, "Windows operating systems do not provide X Windows. For X Windows connectivity, developers need a third-party X Windows server.". Of course Microsoft would never be anticompetitive by competing with third-party suppliers of implementations of an open standard, right?

Re:search.msn.com is the future by Anonymous Coward · 2003-04-19 16:50 · Score: 2, Insightful

It's not as bad as you make it out to be. They do point out (in fine print) that it is a "featured" site. They list the "featured" sites first, then the sponsored links, and then general web hits. And they mark each category. I guess that the only differencebetween featured and sponsored is in the price. All this was far from obvious to me when I saw the results at first (being used to Google), but I imagine that if you used them on a daily basis you would quickly become used to skipping down to the real results.

Read the fine print by anon*127.0.0.1 · 2003-04-19 16:52 · Score: 2, Insightful

It's a "featured site". Meaning it's a site from Microsoft, a Microsoft partner, or someone who paid some money to Microsoft for the privilege.

Nothing that other search sites don't do. They just mark their paid adverts a little more obviously.

--
I am NOT a man!
I am a free number!

Re:Not news for us webmasters by hswerdfe · 2003-04-19 19:11 · Score: 2, Insightful

dude, get over yourself....

I never heard tell of Grub.org before.

I found it interesting....

not every link on slashdot is going to directly relate to you....

--
--meh--

The approach is inherently flawed by oren · 2003-04-19 19:22 · Score: 3, Interesting

It is too easy to send currupted information into the database. They have *no choice* but to trust the clients. Sure they could run spot checks on the results, but they would be very partial and it would be easy enough to fake responses for those as well.

So the more popular it gets, the more incentive people will have to promote their sites by feeding it fake index information. If this magically got to be very popular, within weeks search results would become meaningelss and it would drop back into obscurity. The more likely result would be that it will never become popular in the first place.

Besides, who wants to donate his CPU and bandwidth resources for a commercial company, anyway?

The have cracked it by fireman+sam · 2003-04-19 20:02 · Score: 2, Funny

1. Design a search engine
2. Let everyone else fill it
3. Profit

The second step is finally found!!! YAY

--
it is only after a long journey that you know the strength of the horse.

CPU cycles are NOT wasted or "available" by pe1chl · 2003-04-19 20:52 · Score: 2, Insightful

The common point made by these "distributed" software authors is that there are "wasted" CPU cycles in your computer that you could donate to a project for free.
However, that is not true at all! CPU cycles are not wasted. When the CPU has nothing to do, it sleeps. At least in a modern operating system (i.e. about everything after Windows 95).

By "donating your wasted CPU cycles" you will actually increase the power consumption of your computer. This will be very noticable in a laptop, but when you watch the CPU temperature in your home system you will also see a noticable increase in temperature between an idle system and a system running a computationally intensive background task.

Probably the effect will be worse for things like keysearches, prime number searches, SETI etc than for this GRUB bot, because that probably also spends time waiting for the network (and thus returns the CPU to idle).

So before you "donate your wasted CPU cycles", please realize that this will actually cost you money.

Re:search.msn.com is the future by pafrusurewa · 2003-04-19 21:43 · Score: 2, Funny

The Austrian version of MSN is even better. If you search for Linux, the first two results are WinXP ads on the Microsoft site. And, while you're at it, try searching for google or yahoo. This will produce a popup saying "Why look for a search engine when you've already found one?".

Distributed Crawling From Browsers by txtger · 2003-04-20 01:08 · Score: 2, Interesting

It would be interested to just see a database that is connected to browsers, so that whenever I were to look at a page, the page data would be processed and sent to whatever search engine. Then, those sites that are updated frequently and get a lot of traffic would be more easily searched.

Just a thought.

Re:Grub does NOT look for robots.txt by Anonymous Coward · 2003-04-20 01:44 · Score: 3, Informative

Here it is on mine requesting it: 64.241.242.18 - - [18/Mar/2003:17:25:30 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.242.18 - - [19/Mar/2003:19:41:05 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.243.81 - - [30/Mar/2003:22:10:41 -0700] "GET /robots.txt HTTP/1.1" 200 222 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" 64.241.243.81 - - [01/Apr/2003:23:11:21 -0700] "GET /robots.txt HTTP/1.1" 200 223 "-" "Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)" Notice those are LookSmart owned ip's and not just normal user crawlers. They seem to centrally crawl for robots.txt. They do know, however, that they need to crawl for robots.txt more often.

hair is raising on the back of my neck by malia8888 · 2003-04-20 07:35 · Score: 2, Interesting

Uh huh, Grub is going to "run in the background" ?
No thanks!!. It just doesn't feel right. It is sort of like lending a firearm to an untrustworthy neighbor. What is in it for the lender other than potential problems?

Spyware "runs in the background" and slows up peoples machines. What really happens to one's machine performance with Grub? And, more importantly, where is my check?

--
Harpo Tunnel Syndrome--my wrist feels funny.

Slashdot Mirror

Building a Bigger Search Engine

75 of 278 comments (clear)