Domain: grub.org
Stories and comments across the archive that link to grub.org.
Comments · 54
-
Distributed search engines failed
We've tried this before with GRUB, but it didn't really take off for a multitude of reasons.
-
Woah
Wikia Search is open source, it's based off of Grub (which we have already talked about before). Here's the source code to the grub Windows client, and there's a dev site too. The current scoring algorithm is over here. If you want to talk with Jimbo and the developers, hop on to mailing list and let's talk.
Anyway, it looks like there's the opportunity here to *improve* this search engine -- programmers, I know you are reading, and at least check out the code. There's been talk about running some competitions for improving the search results (the scoring algorithms), how many of us would like to form a team? Maybe I'll do one. Who's with me?
(Btw, these guys need help. I just found all of this after the recent news articles.) Screw my mod points. -
Woah
Wikia Search is open source, it's based off of Grub (which we have already talked about before). Here's the source code to the grub Windows client, and there's a dev site too. The current scoring algorithm is over here. If you want to talk with Jimbo and the developers, hop on to mailing list and let's talk.
Anyway, it looks like there's the opportunity here to *improve* this search engine -- programmers, I know you are reading, and at least check out the code. There's been talk about running some competitions for improving the search results (the scoring algorithms), how many of us would like to form a team? Maybe I'll do one. Who's with me?
(Btw, these guys need help. I just found all of this after the recent news articles.) Screw my mod points. -
Woah
Wikia Search is open source, it's based off of Grub (which we have already talked about before). Here's the source code to the grub Windows client, and there's a dev site too. The current scoring algorithm is over here. If you want to talk with Jimbo and the developers, hop on to mailing list and let's talk.
Anyway, it looks like there's the opportunity here to *improve* this search engine -- programmers, I know you are reading, and at least check out the code. There's been talk about running some competitions for improving the search results (the scoring algorithms), how many of us would like to form a team? Maybe I'll do one. Who's with me?
(Btw, these guys need help. I just found all of this after the recent news articles.) Screw my mod points. -
Re:Challenging Google?
It looks like you've entered some sort of partnership with Grub http://www.grub.org/.
If so, kudos... Grub's been languishing in not-ready-for-primetime land for far too long, and the ability to crawl your own site to keep results current is a bonus, too. -
Translated Press ReleaseFTFA:
Wikia has aquired the Grub sourcecode from LookSmart. We will be posting the complete, current codebase as soon as possible, here on Grub.org. In the meantime, signup and stay tuned to developments regarding getting Grub going again.
Translation: Development had slowed to a SNAIL's pace(*), but now casting off its SHELL, we bring you a new and improved (TM) GRUB!
(*) From: Member Statistics (as of 20070730 at 14:47 EDT)
Members Overview (see all)
Total members: 1,049
Oldest member: 14,016 days
Active this month: 2Let's see here:
dc --expression="2k 14016 365.25 / f"
38.3737 YEARS! -
Re:How do we know Goog isn't giving up info alreadJust as an aside, it's high time there was a serious effort at producing a decent open source search. Personally, I think a distributed network with anonymizing services makes the most sense. I know there are projects in existence already, but more people will have to become aware of them. Some Open Source search projects are:
http://www.majestic12.co.uk/projects/dsearch//
http://www.aspseek.org/about.html//
http://sourceforge.net/projects/ebiness//
http://www.grub.org/html/documents.php//
http://lucene.apache.org/nutch/bot.html//
I really want to see one of these projects take off, I'd tap a vein at the local plasma center to donate funds
:> -
Re:OSS Google Killer?
What, like grub? Or any of the myriad of p2p search engines?
-
Re:Formula
Yes, profit. If you recall Google was completely privately held since 1998 til about their recent IPO. Why? Because that's when the owners decided the value was fully generated, and it can no longer grow, or (gasp) even fall. Time to dump and get cash while it's hot. For all the wonder that Google is - and many thanks to its precious inventors, you are forever in our hearts - it's core technology is severely limited, because it's based on a centralized system, and there is something better on the horizon. The real answer is distributed computing, where you can locally do the indexing and only send up the index, but this means giving up control, thus giving up sharevalue. I wonder how long will it be possible for this next wonder-genie be kept tight in a bottle. It could be quite sometime til the cork is pulled - a few thousand years? - but sooner or later it happens.
-
Big name != "real"
I see that there appear to be real, legitimate, search engines that do not follow robots.txt rules.
No, you rather see some well-known search engines that generate illegitimate traffic instead of behaving properly. I note a number of them in this highly-documented robots.txt file. I'm personally most offended by idiots running this shit, since there is no single IP block to blacklist.
-
Re:Let me guess...
RoadRunner will also complain loudly if you run the Grub crawler. (It's sort of a distributed web spider.) Same reason, too many DNS lookups. I setup a local BIND9 forwarding to Verizon's free server on 4.2.2.4, and pointed all my machines to use the local BIND server.
Verizon isn't bitching about it and I'm not even paying them =) -
DIY Search
Google, Yahoo, MSN etc... why do we actually rely on a commercial entity for searches? Wasn't there a distributed search project called Grub? Can't we just set up something similar that would be totally independant of any entity that would always be suceptible to *cough* influence *cough*?
Something similar to the Linux movement, but with even more impact to the general Internet population? C'mon, we can do it, don't we? We're also using bittorrent to get more independance of central ftp servers. Distributed search would be just the same.
-
There already is distributed crawling
It's called grub.
-
Re:P2P?
The closest thing to what you're talking about is Grub, which is run by Looksmart as a dead-link checker and also feeds to WiseNut. While it doesn't allow you to crawl sites that you don't have control over, it does allow you to crawl your own site.
Personally, I've wanted a Google toolbar that indexes the sites that you surf, and adds additional positive weight to the sites that you linger on. It may not know what you liked there, but it knows that you liked it.
Completely offtopic, but does anyone know of a screensaver on Windows that displays random (or spidered) web pages? I've been looking for an equivalent to the XWindows version for years.
-
Re:Let's roll our own distributed search engine
-
Re:A few nits to pick.
If distributed search and trust proves to be better, won't Google simply adopt it as a model? It's hard to imagine a company as innovative and powerful as Google letting itself be displaced.
Are there any other major distributed search projects going on right now besides Grub? -
Re:I probably would have done this differently...For the first problem, though, perhaps a SETI-ish distributed "Heritrix" could help make regularly archiving all of these sites a managable affair. IA sends marching orders out to the distributed volunteer network, each clients downloads, compares MD5 of the pages with other clients, compresses them, and sends them back to a master archive. Sounds great in theory, at least at first, to me...
There actually is a program out there called Grub that tries to follow this concept. I had contributed to the project in its infancy, but once it was bought out by LookSmart, I kinda moved away from it. A lot of people were complaining about Grub's utter lack of respect for no crawl sections of sites and robots.txt. It might have changed a little bit since then to actually support robots.txt, so it might be worth your try.
-
Yes There is a Distributed Search Engine...
Bah, more like anonymous lazy person. You might want to look at Grub which is actually a distributed search engine. The spidering results still go to a single, monolithic organization though, LookSmart I think. Getting a truly decentralized search engine would be difficult, especially with the marked delays that decentralized networks suffer from, but I see no reason why not. Are you going to program it?
Starling -
What About Distributed Search Engines?
Are there any non-commercial projects afoot to build a distributed web search engine? I found Grub , a SETI@home-like web crawler, but it seems to me that any commercial venture under sufficient financial pressure will eventually resort to paid listings.
-
read the grub forums
The idea is cool and I imagine it won't be long before an org. without links (unverified) to M$, will do the same thing. There's at least a couple of people on the grub forum who are figuring out some of the shadier sides of this code: potential spyware? security hole? And the licensing is vague (no links).
Note the tone of their pitch as well you are participating in a competitive group effort a kin to Seti@home and Distributed Net? I don't think so... caveat emptor. -
Re:Altruistic?
Notice
======
The main executable has been renamed to "grubclient" out of respect for the GNU Grub bootloader, who's executable is named "grub". They were out first, so we decided to pick another name. If you have a catchy suggestion for a new name, please let us know.
I nominate "parasite". -
Help Grub crawl the web
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget Seti and that distributed.net crap, helping crawl all the internet is more of an attainable goal.
-
Help Grub crawl the web
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget Seti and that distributed.net crap, helping crawl all the internet is more of an attainable goal.
-
Help Grub crawl the web
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget Seti and that distributed.net crap, helping crawl all the internet is more of an attainable goal.
-
Forget Seti and Distributed.net
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal.
-
Forget Seti and Distributed.net
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal.
-
Forget Seti and Distributed.net
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal.
-
Something more terrestrial
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal. -
Something more terrestrial
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal. -
Something more terrestrial
The Grub project is a distributed method of crawling the internet. You download the client and you help Looksmart( their search engine wisenut is pretty good but not the best ) crawl the web.
In my opinion it is better to help contribute your spare bandwith and cpu to help make sure more of the internet is crawled and more frequently instead of something more pie in the sky like SETI. Grub has a more down to earth use. Help make sure all of cyberspace can be crawled.
Download the grub client:
http://www.grub.org/html/downloads.php?PHPSESSID=a a2b3b639ab6f4b92965e132a1418df9
There is a linux version. Get crawling, forget seti, helping crawl all the internet is more of an attainable goal. -
Re:Other uses for Distributed Computing"I'm surprised google hasn't come out with a spider at home client which goes out and searches the web caching sites as it goes. Sure distributed computing could help their venture and who doesn't love google?"
Perhaps you are thinking about grub?
-
Re:Other uses for Distributed Computing
This grub project is on the way to doing just that.
-
Glad
Awesome for Seti@Home. I have been active with them for years now. I enjoy distributed project. I also enjoy Grub a distributed search engine project.
-
Jesus' number was 7, so....
-
Re:English please!
I'll try, but I get WAY out of my league when you talk about anything bigger than n=3.
n represents dimensions. i.e. n=1 is one dimension, n=2 is two dimensions, n=3 is three dimensions, etc.
"simply connected" just means that the boundary surrounding something is connected. For example, in n=2 space (a piece of paper for example), if you drew a line around a bunch of ants, and connected the ends, it would be simply connected. If your line was actually two lines, and weren't connected (you had two groups of ants) then you have a multiply connected boundary.
A manifold (sorry, had to use it) is just an object without a boundary. The earth is a manifold, as is any other n=3 space (3d) object that is connected to itself. In n=3 space, the only way you can have a boundary is to have two different objects, in two different locations.
Homeomorphic just means one object is like another.
They generalize the whole idea to one where all the objects are compact. That just means that the objects "surface" area is as small as it can get for a given internal volume. For example in n=3 space, you can minimize an area (like the material of a balloon) in relation to the volume inside (the helium). Circles are compact for n=2 space, and spheres are compact for n=3 space. BTW, even though I state this as if it were a fact, we don't know about all the compact spaces where n > 2. It would *seem* to make sense that a sphere is the only compact object to 3 space, but stating that as a truth, as of today, isn't possible. Maybe we can do that after they win their million bucks....
So, the whole thing boils down to showing that a compact 3d object is the same as a sphere.
Kord
Shameless plug, check out Grub! -
Insects Rock!
Grub uses names of insects (bugs if you will) to name its computers. Among some of the names are, ant, roach, termite, muva (macedonian for fly), beetle, brainbug (ok that's not a bug, but you get it).
Naming your computers something fun should be a requirement!
Shameless plug, check out Grub! -
Doh! Watch where you point that thing...
"This also makes it a lot safer to work with, i.e. you won't burn your eyes out if you accidently look into it."
It looks as if the author has learned this first hand if the font size on the instructions is any indication.
Check out Grub! -
Can't keep up CRAWLING, or can't keep up INDEXING?
There's a big difference. We here at grub.org have been working on a distributed crawler, who's whole purpose is to crawl the net AND crawl local content with the sole purpose of looking for changed content, compressing it, and sending it back to us. We've also put together a hacked up install of MnogoSearch on our site (see here), that indexes what we we get back from our crawlers. (Please keep in mind that we flush the Mnogo database fairly often right now if you look.)
Even though we don't crawl beans compared to Altavista, Mnogo still chokes up at just under a million URL inserts a day (at it's best), and that's a problem because we are already crawling 3x that, and have a new client ready for release that does 3x the crawling what the old one did. The short of it is that a single Mnogo INDEXER can't keep up with our CRAWLERS by an order of magnitude.
Knowing all this makes me wonder which problem Altavista is experiencing. Or maybe it's something entirely different holding them up. On the crawling side, if they are only crawling every 60 days, and have 600 million URLs in the database, then they are crawling 10 million URLs a day. If you crawl 10 million a day, at 15K a piece, then you are using about 15Mbps of bandwidth, which surely they have, right?
On the indexing side, not all your pages will update every time you visit them, which means you can just index the ones that changed. We see an update rate of about 60% (the new client shows this to the user, BTW) on pages that we crawled a few weeks ago. Given that they needed to insert/reinsert those pages, their database/engine would need to insert about 500 words per page, or 3 billion inserts a day total for all the pages that they indexed. Keep in mind that they may not be able to delete words located on a particular URL, because you'd need an index on URLs on the word table which can slow things down and get REALLY big.
If it's a crawling problem, then I suggest calling us. ;) If it's a matter of deletes/inserts, then I guarantee that their reworking their schema right about now.
I'd really hate to see Altavista go, they were my first choice until Google came along and started kicking ass.
Shamless plug, check out Grub! -
Can't keep up CRAWLING, or can't keep up INDEXING?
There's a big difference. We here at grub.org have been working on a distributed crawler, who's whole purpose is to crawl the net AND crawl local content with the sole purpose of looking for changed content, compressing it, and sending it back to us. We've also put together a hacked up install of MnogoSearch on our site (see here), that indexes what we we get back from our crawlers. (Please keep in mind that we flush the Mnogo database fairly often right now if you look.)
Even though we don't crawl beans compared to Altavista, Mnogo still chokes up at just under a million URL inserts a day (at it's best), and that's a problem because we are already crawling 3x that, and have a new client ready for release that does 3x the crawling what the old one did. The short of it is that a single Mnogo INDEXER can't keep up with our CRAWLERS by an order of magnitude.
Knowing all this makes me wonder which problem Altavista is experiencing. Or maybe it's something entirely different holding them up. On the crawling side, if they are only crawling every 60 days, and have 600 million URLs in the database, then they are crawling 10 million URLs a day. If you crawl 10 million a day, at 15K a piece, then you are using about 15Mbps of bandwidth, which surely they have, right?
On the indexing side, not all your pages will update every time you visit them, which means you can just index the ones that changed. We see an update rate of about 60% (the new client shows this to the user, BTW) on pages that we crawled a few weeks ago. Given that they needed to insert/reinsert those pages, their database/engine would need to insert about 500 words per page, or 3 billion inserts a day total for all the pages that they indexed. Keep in mind that they may not be able to delete words located on a particular URL, because you'd need an index on URLs on the word table which can slow things down and get REALLY big.
If it's a crawling problem, then I suggest calling us. ;) If it's a matter of deletes/inserts, then I guarantee that their reworking their schema right about now.
I'd really hate to see Altavista go, they were my first choice until Google came along and started kicking ass.
Shamless plug, check out Grub! -
Can't keep up CRAWLING, or can't keep up INDEXING?
There's a big difference. We here at grub.org have been working on a distributed crawler, who's whole purpose is to crawl the net AND crawl local content with the sole purpose of looking for changed content, compressing it, and sending it back to us. We've also put together a hacked up install of MnogoSearch on our site (see here), that indexes what we we get back from our crawlers. (Please keep in mind that we flush the Mnogo database fairly often right now if you look.)
Even though we don't crawl beans compared to Altavista, Mnogo still chokes up at just under a million URL inserts a day (at it's best), and that's a problem because we are already crawling 3x that, and have a new client ready for release that does 3x the crawling what the old one did. The short of it is that a single Mnogo INDEXER can't keep up with our CRAWLERS by an order of magnitude.
Knowing all this makes me wonder which problem Altavista is experiencing. Or maybe it's something entirely different holding them up. On the crawling side, if they are only crawling every 60 days, and have 600 million URLs in the database, then they are crawling 10 million URLs a day. If you crawl 10 million a day, at 15K a piece, then you are using about 15Mbps of bandwidth, which surely they have, right?
On the indexing side, not all your pages will update every time you visit them, which means you can just index the ones that changed. We see an update rate of about 60% (the new client shows this to the user, BTW) on pages that we crawled a few weeks ago. Given that they needed to insert/reinsert those pages, their database/engine would need to insert about 500 words per page, or 3 billion inserts a day total for all the pages that they indexed. Keep in mind that they may not be able to delete words located on a particular URL, because you'd need an index on URLs on the word table which can slow things down and get REALLY big.
If it's a crawling problem, then I suggest calling us. ;) If it's a matter of deletes/inserts, then I guarantee that their reworking their schema right about now.
I'd really hate to see Altavista go, they were my first choice until Google came along and started kicking ass.
Shamless plug, check out Grub! -
Re:Cisco Support
Actually I have a couple of puny little 2500s and no contract and they still helped me with an IOS bug a year after I purchased them - eventually giving me a free upgrade of the IOS. If for ANY reason (IOS/hardware) a Cisco is having problems I guarantee that Cisco will help you.
About a year and a half ago I bought a used 3600 and then found out via Cisco tech support that that particular run of 3600s had a hardware bug and they RMA'd the damn thing, shipping me a BRAND NEW unit before I returned the other one. Past that, I called later with a problem of my own causing, and they still had a couple of their techs help me out.
All in all, Cisco does a great job of supporting their hardware.
Shamless plug: Check out Grub! -
The editors have the REAL power over dmoz
dmoz, much like us, is reliant on its contributers to build its directory. Without the contributer's/editor's/client's blessing, and continual contributions, you have a database that is pretty much worthless. Gracenote has a bigger advantage than dmoz or grub.org does over its users/contributers in that it has already built the bulk of its database, and only needs occasional updates to it to keep it current.
Someone like Musicbrainz could just as easily restrict access to their database at a later date, even though it's currently licensed under OpenContent. (I really doubt they would do this, BTW).
Look, if Netscape chose to screw the community by closing or limiting access to the database, it would surley piss off the editors which would then be cause them to stop doing submissions. No submissions = No database. I suspect that projects like dmoz and grub, who rely on a constant influx of information to stay current, will be kept honest by default That said, I think that dmoz has taken a step in the right direction trying to address these issues.
Shameless Plug: Check out Grub! -
The editors have the REAL power over dmoz
dmoz, much like us, is reliant on its contributers to build its directory. Without the contributer's/editor's/client's blessing, and continual contributions, you have a database that is pretty much worthless. Gracenote has a bigger advantage than dmoz or grub.org does over its users/contributers in that it has already built the bulk of its database, and only needs occasional updates to it to keep it current.
Someone like Musicbrainz could just as easily restrict access to their database at a later date, even though it's currently licensed under OpenContent. (I really doubt they would do this, BTW).
Look, if Netscape chose to screw the community by closing or limiting access to the database, it would surley piss off the editors which would then be cause them to stop doing submissions. No submissions = No database. I suspect that projects like dmoz and grub, who rely on a constant influx of information to stay current, will be kept honest by default That said, I think that dmoz has taken a step in the right direction trying to address these issues.
Shameless Plug: Check out Grub! -
An admirable project!This is an excellent project. I don't know of any other free software projects to index content on the Internet. Far too many companies develop client/ server applications and make the mistake of keeping them proprietary, and even if free replacements get written they often only replace the proprietary clients. This is bad because it actually increases the use of the proprietary server!
This project is entirely free. Thus it is much better. People should go to the project homepage on sourceforge and help out. The current goal is only to index content, and a later stage will implement intelligent search functionality. See the project overview here . I am sure that they would love to have more people who are able to do that helping out.
Hackers, get involved with this project that can replace one of the most used pieces of proprietary software, the Internet search engine!
-
Re:What's the license on the database?From the FAQ: (emphasis mine)
- Q: What exactly does grub.org do?
A: grub.org is a company with a single purpose -
...We will make all software written during the project Open Source as well as all the hows and whats of setting up the network and database. If there is someone we can help by sharing what we've got, we'll share it.
Q: That's insane, what will Grub's revenue be if it doesn't charge for the software?A: Open Source is not synonymous with NOT making money! We have come up with a hybrid business model that uses four distinct methods for generating revenue...
By placing the crawler closer to the data (i.e. on the web server itself) our client will be able to analyze and index the data local to the system on which it is running.
Q: So if I were a system admin or a website author I'd want to run the client?
A: Yes! Anyone that provides web hosting/authoring services will have a use for running our client. In addition to crawling a portion of the Internet, the client can index the admin's/author's entire site each and every night, and then submit that summary to grub's servers for incorporation into the database. Running the client will allow them to provide an added value for their clients - having their web pages updated to the biggest index, each and every day.
-- -
Re:What's the license on the database?Read their Investor Page - they absolutely plan on charging the search engines to use the data AND to sell top result spots to the highest bidder. Open source or no open source - this is a joke - they won't get a sliver of my bandwidth.
Here is the section outlining what they plan to do with all this free data 'volunteers' give them:
The first revenue stream will come from selling URL status information to companies like Google and Altavista. This status information will enable existing crawlers to target the crawls for a particular day, based on the highly up-to-date information contained in our database. These status updates are similar in nature to the service provided by someone like NetMind, in which a change on a website triggers an action. Grub's database will be much vaster by comparison however, enabling it to provide services directly to wholesale search engines.
Second, Grub will begin selling "wholesale searches" to other search engines and companies. Grub will make strategic alliances with other search engines much in the same way that Google has done with Yahoo and Inktomi has done with Hotbot. Grub will also provide one-shot search results for a large search query, delivering the data in a database format (like XML) instead of a web format.
Third, Grub will begin charging website customers for content control. Content control consists of indexing updated information on a regular basis and controlling link placement in search results. Large sites who's revenue depends on sustained inbound web traffic will be charged based on the amount of data that they submit into Grub's database, and on what placement they get in Grub's search result sets.
Fourth, Grub will provide consulting services for companies wanting to set up their own Grub networks. Large corporate intranets could be quickly and efficiently indexed into a central database with the Grub client/server model. Consulting and coding for these proprietary installations is a common model in Open Source oriented businesses like Sendmail, MySQL and Apache.
Guess they thought we were really that stupid!
--
-
Neat idea - but I'm gonna pass...So it sounds like they want to provide the info they gather to other existing' search engines. Hey - now Grub crawling the internet and sending its data to Google to make Google even better - I'm all over that. Of course, if they send data to Excite, I'll stop running the client. I cannot believe how Excite (and all the affiliated search engines they have now purchased) pretty much requires payment to get added and if you use the free form 'the site will be reviewed and there is no assurance it will be added. Process may take 4 to 6 weeks.'
Thank goodness for Google!
But again - this brings up the question similar to what happened with CDDB. Here you have internet volunteers providing free CPU power and bandwidth to provide raw material to for profit companies. Now granted - it is slightly different since you can still Google for free
:) I'm not that selfish, but obviously there are some companies I'd be HAPPY to play a small part in improving their data set (Google) and others that given recent developments with URL submission and monetary sorting of search results that I wouldn't want to give data to unless they paid for it :)Which now that I read the site more is their business plan. Read their Investor Page I get a squirrely feeling about this. I don't care if the client is open source or not. Why should I use up my precious bandwidth to supply content to a for profit company to sell to other for profit companies? Yes, they give the data away to non profits, but heck - most of them use Google anyway
:)And of course they are following hte lead of the other greedy search sites - adjusting search result order for money which I can't stand. Google is the one search engine that got it right - sort data by relevance and popularity.
I'll read more about it - but I think I'm gonna pass on this on - I just don't see the benefit for the volunteers who run this both on a selfish individual scale and a broader Internet community scale
--
-
How about....
do
{
write_software(to_block_ads);
advertisers_come_up_with(new_ad_method);
} while (ads_still_exist);
Real-time indexing of the Internet coming soon!
Kord -
This isn't a suprise...
I've been on IRC for years and all sorts of stuff trades hands there. Anything from MP3s, pictures, and warez can be had in plenty. The problem with IRC has always been that it's too damn hard to figure out how to get that stuff - at least for the casual user.
Interfaces that rely on IRC (and DCC), make it easier for the average Joe to use.
Real-time indexing of the Internet coming soon!
Kord -
P2P vs. Distributed ComputingOver on O'Reilly's site I noticed Dave Sims discussing whether or not distributed computing software should be considered the same as P2P software. I have to agree with him somewhat on this issue as the project that I'm working on right now is not exactly 100% P2P (in fact some argue is 0% P2P). However, I think it's important to understand that a lot of the same framework has to been coded up in either P2P or distributed computing products, and maybe that's justification enough to mix the terms.
Kord
Realtime Indexing of the Internet. Coming soon!