A Search Engine For The Slower Net
Makarand writes "According to this BBC News
article researchers at MIT
are developing a search engine for people
using the web on slower net connections.
The software will e-mail queries to a central server and receive the most relevant
webpages from the search results by e-mail in a compressed form. Since the program is too big to download over a poor net connection
it will be mailed on CDs to libraries for people to borrow and install. They are also considering trying to persuade computer sellers
in developing countries to install the program on machines."
About them Modem Linkers,
ain't they kinda odd?
Goin' on the net,
with they little baud.
Look at all those Modem Linkers,
what a thing to see.
Web sites come up really slow,
get's lousy Voice/IP.
Internet at low bit rates,
what a dawgon mess.
Load a web site, take a break,
while 'pache mods compress.
How to be a Modem Linker,
don't need a ticket.
Get a local ISP,
dial up and link it.
A programmer is a machine for converting coffee into code.
Use lynx or w3m and search on google like the rest of us, ya Nancy-boys..
Trolling is a art,
I know the Internet is complicated - but there's no need to pick on slow people.
Maybe we could have all webpages categorized by a number, something like 800 for science or whatever, and then we could have a filing cabinet with index cards in it. Then, people could open the filing cabinet, see a number for the page they want and then go directly to the page.
Works fine on my dial-up at home.
I mean really. I use dial up occasionally, and I can get my search results in 20 seconds instead of 2. What point does it serve emailing your search query off and waiting much longer for the results?
"Ask not for whom the bone bones. It bones for thee." --Bender
I've been on a 28.8 modem and it wasn't that bad. This is just a way to get publicity.
.5kbps connection then it might be worth it.
But if you're on something like a
Gotta love how software catches up about 10 years too late...
[FromTheMorning]
Could anyone else figure out why this requires a program on the user's end that is too large to be downloaded? Seems like all you need is an e-mail client, and instructions on how to format the information request.
This post is dedicated to all of those
what a really really dumb idea. really. i mean, WTF? what's the point?
...still surfing the internet with their Commodore 64s and 300 baud modems!
http://www.mshiltonj.com/sr/
I believe it is the pages themselves, not Google, that this is an attempt to deliver.
Might be a nice way to preserve searches for later perusal. Unlike bookmarking, the returned search results are stored in an email.
This would be a good way to preserve stuff that may be the subject of removal due to court order, like xenu.net and other similar de-Googlings.
MIT guys! Why don't you put your brain into better compression technology? So we can deliver higher bandwidth to those still on crappy 56K lines?
And don't say it isn't doable... If I had the time, I could do it, and I'm a mere highschool graduate...
---
Programming is like sex... Make one mistake and support it the rest of your life.
I think the point isn't that Google's returned index is too slow but actually clicking through and loading the pages themselves are prohibitively slow.
I think this would be more appropriate in cases where access was unreliable and intermittent. If you could get on long enough to submit a query, you'd get results back (albeit slowly) even if your access was cut off.
or...
Why download a zip of multiple results when in many cases just the first result is needed?
if(!cool) exit(-1);
I had the same initial reaction, but after RTFA (I know, shame on me), it seems that the limitation isn't so much time, but continuous time hogging the phone line accessing Google, checking out pages, etc.
Instead, this service would package together selected results of the search, for overnight download into the PC's cache. The user can then browse through the material at their leisure without needing to use the internet connection (which is the scarce resource).
Stop by my site where I write about ERP systems & more
Isn't mod_gzip already used with popular search engines? ...Am I missing something?
It's nice that MIT has the processing power and bandwidth to receive data from search engines, uncompress them, REcompress them, and send back to the "queryer".
Sounds like more overhead and more trouble that what's already in place that does the same damn thing!
with my 300 baud modem.
I agree 100%, I just moved to a new place where I am forced to use dial-up because I am in an area that does not have access to DSL, cable, or anything else that is still decently priced... anyway when I run a search on google it takes at most 5 seconds to get the listing back of the results. If this program needs to send an email to the user letting them know the search results this will take at least 2 maybe 3 times longer (in the fastest instance) for the user to get the results. I think that MIT needs to re-think their ideas and come up with something more useful rather than coming up with something that will just cause more headaches to the users forced to use the slow connections like myself.
"The two most abundant elements in the universe are hydrogen and stupidity." -Harlan Ellison
Buy Steampunk Clothing Online!
I read this story a couple of days ago, and thought it was rather a strange idea. I don't know where access is that slow. At the time, I thought maybe it might be used with the ham radio internet they're going to get in Laos as part of a "empower the farmers" program.
Put identity in the browser.
for my cable Internet connection at home.
Yes, I am dead serious... Lets just say Charter's cable Internet in my area lately really stinks. I would almost rather be on a 14.4k modem - no joke. I am not the only user... I get lag spikes of over 3000ms when not doing anything, and almost dropped connections. Good thing DSL recently became available in my area =D. One less Charter Pipeline subscriber.
They are developing the program which will replace web forums - you post a message to predetermined mail account and everybody subscribed will receive it very soon (patent pending).
File transfers and weather forecasts are planned in 2006.
This will make a difference.
- Arwen, I'm your father, Agent Smith.
- Well, you're just Smith, but my father is Aerosmith!
So its a Google API... and a program that zips the pages? Wow, heavy development.
Who needs the internet when we have a perfectly functional postal service
Sounds exactly like the agora email services from way back in the day.
/ 00 26.html
http://www.bellanet.org/email.htm
http://scout.wisc.edu/addserv/NH/95-11/95-11-21
Rocks baby! Pet rocks! Get it? Get it?
.....
Or how 'bout this? Seven minute abs? Get it? Get it?
Or, or, or,
.sig
It explains. Honestly.
For those of you wondering why someone would do this, how about reading the damn article?
The program doesn't e-mail back with a mere mirror of a google / yahoo results page. It actually filters through the individual results compressing the entire page. e.g. my search turns up a CNN page and a blurb on MSNBC and I get, e-mailed to me, compressed versions of those actual sites, not just links to them.
As far the "my 28.8 modem is just fast enough" crowd -- read the article! Some of these locations the software is being developed for don't even have access to a phone line on a regular basis. And the lines they do have access to are more likely than not to be noisy as hell and not able to support a 28.8 connection.
They are also considering trying to persuade computer sellers in developing countries to install the program on machines.
They are going to develop countries to install the program on people's machines?
TODAY
I am reminded of the Prepaid Legal system of doing business. You call up and ask a question, and the next day, an attorney familiar with the area you are asking about calls you back to answer your questions and advise you. So maybe this isn't all that outdated of an idea after
IN REGARD TO THE SYSTEM IN THE ARTICLE:
To have this capability back in 1973 would have been unbelievable. In 1983, to have this available to every library in the US would have been an unbelievable achievement. To have it now is so slow that I start to go google eyed even thinking about it.
BUT
This is great for countries that are 20-30 years behind in technology. It will revolutionize the search for information for areas that are not as connected as the US.
is not slow connections, but connections that are unreliable
Using the phone in a country like Malawi can be a real adventure. It's not like the US at all.
(Score: -1, Didn't Read the Article)
It's designed for computer's that don't have fulltime internet connection. The program dials up at night and sends off the queries, so then the next day after the dial up/fetch/retrieve, the results are in.
in 1988
So why is it that the answer to all of my searches is either "wet teens," "Generic Viagra," or "I am a banker from Nigeria?"
* Please do not read my signature.
Coincidentally (?) it is also very usefult to circumvent the Great Firewall. Way to go, but it would also be nice to optionally have the cached content (ala google) e-mailed as well. That would send the last standing wall crumbling.
Code poet, espresso fiend, starter upper.
"Let us assume you are in Malawi," explained Prof Amarasinghe, "and the computer lab does not have access to the telephone line all the time." "If you want to find some new information about malaria, you are prompted with a message that says 'we are going to send a query through e-mail, it is OK?'. "At night, when the phone line is available, the teacher can dial out and send the queries." The request is sent to computers at MIT in Boston, which then search the internet and gather webpages.
RTFA.
Why don't you just scream "HI I'M FROM 'WESTERN' CIVILIZATION AND HAVE NO IDEA HOW THESE THINGS WORK IN LESS PRIVLEGED PLACES"
Google is too slow when your school has one phone line that is used for _everything_, including net access. Not to mention the cost of using the phone anyway. This allows all the students to submit thier searches to a teacher one day, the teacher then submits the all searches with only a couple minutes of dialing up. He can retrieve the compressed results a few days later with only a minutes of dialing up. Now go read the article. Someone needs to mod that post down, hopefully the poster can redeem themselves later in the thread with something insightful.
Maybe these poor 14.4-ers should finally get the memo - they're human rights are being violated! Everbody, including those in Estonians, know that it is inhumane for them to be stuck piping information through their small RJ11. Starvin' Marvin is weeping for their poor souls.
nt
Looking for a problem.
I recall 'back in the day(tm)' several email services which you could send an email with a URL in the subject and it would return the web page to you in a reply (a -la a lynx like format). That was -- what? 10 years ago?
I've used 19k, 14.4k and even 4800 baud modems to connect to the net and browsed comfortably. Something as simple as unchecking the box [DOWNLOAD GRAPHICS] (or whatever setting your browser uses) will accomplish this.
What kind of 'slowness' are we taking here? 110 baud?
-jhon
Under capitalism man exploits man. Under communism it's the other way around.
Woman loves woman? Isn't that called Lesbianism?
And for those people with no internet connection, you can mail your search requests to MIT (Please include self-addressed stamped envelope). MIT will then process your search request within 5 business days, and mail you back the results. You can then peruse the results and marvel at the wealth at information you'd be able to find... if only you had internet access.
Shameless plug for my photos on Flickr
I tried to RTFA but MIT hasn't emailed it to me yet :(.
my blog
I think _you_ need to RTFA.
I saved the html for google and it was 3405 bytes. If my math is correct that would be about 92 seconds on a 300bps connection. It will probably only be useful if you have a really slow connection.
I'm gonna get modded down, but fuck it, I have karma to burn and a soapbox to stand on.
... their computer... oops... WHAT FUCKING COMPUTER?
These people from mit are getting research money to spend on this rediculous piece of shit project that NO ONE will ever use. Can you imagine, a farmer in Ghana or something and he gets malaria. His first thought is "Hmm, I wonder if there's any information on the internet about this?"
I mean come on.
I mean fucking really.
But ok, I'll even give them that... let's pretend that farmers in countries where the internet connection sucks so bad that google, a site that is optimized to work on pretty much anything, is too slow... even know about the internet and have any use for it... are they going to go to the library, borrow software, take it home and install it on... oh yeah... install it on
Asshole mit people with nothing better to spend money on.
I think_you_need to GAFL
...only webdesigners had not collaborated to turn the web into the graphics orgy it is today. I mean, have these kids coming out of graphics school even browsed the relevant w3c specifications?
News Flash !
CSLIP already compresses it, most modems made since 1994 compress data, compressing it again at the application level won't help. Nevermind that the mail program will uuencode the data anyway & severely bloat it.
HIV Crosses Species Barrier... into Muppets
Waiting a day to get an email from a search engine - that is like waiting for batch printouts in 1982 but worse.
Basically I don't see the point in this as it is being described. I suspect though that there is more to it as MIT is not full of dumb people.
Stopgle.
It's Christmas everyday with BitTorrent.
You neither have a particularly slow connection - the majority of the net is on dial-up - nor are you (apparently) in a situation where hogging a phone line is a problem. You've probably got unmetered short-distance calls, so whether you're on for 3 minutes downloading a 400k email or whether you're online 30 minutes reviewing those 400 kilobytes of information doesn't matter to you.
In short, RTFA. Sorry.
Not that I'd wager that this is some kind of brilliant, revolutionary idea, but really, the article doesn't even imply that anyway.
Switch back to Slashdot's D1 system.
That's what it sounds like to me.
I did RTFA.
So now, you're going to have all the pages downloaded to your PC for you, when it's quite likely that the very first link was the one you wanted anyway?
What about the bandwidth costs of doing that? And exactly how slow are these connections anyway? Google's search page is a few KBs - I can't imagine how downloading every possible hit (say top 20 hits) is feasible where downloading a single page of a few KBs is not.
Mmmm.. Donuts
For those of you who want to try it out at home, just use one of your several hundred AOL CDs, and voila, you'll have a line slow enough to try it out.
To try out this demo, please follow these simple steps:
1. Pick up the phone and call the automated voice search system at (650) 318-0165.
2. After the prompt Say your Search Keywords, say your query to the system.
3. Click this link and a new window will open with your voice search results.
4. Say another query, and the new window with the search results will be updated with the new results.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Why not just use Content-Encoding:gzip to reduce the amount of time (text) pages take to transfer over the connection? Or have your ISP's proxy server compress everything on it's way over the pipe? Surely someone has thought of this before...
This is the first article I've read (in memory) that spawned five pop-ups upon following the primary link. A pox on greedy posters!
He can retrieve the compressed results a few days later with only a minutes of dialing up.
Huhh?? How? If you download just the results page, then that is pretty useless since you then have to click on the links to see if it is the relevant link or not.
If on the other hand you download the actual contents of the top 20 pages then given how slow your connection is supposed to be, I don't see how you could do that in "only a few minutes".
Mmmm.. Donuts
Consider areas that only have access to the Internet via radio...at certain, limited times of the day.
Pop! (sound of a lightbulb turning on)
I live ze unknown. I love ze unknown. I am ze unknown.
> I think that MIT needs to re-think their ideas
To their credit, at least this isn't some lame wearable shit, or a stupid musical instrument which spins around or whatever. Jesus, they come up with some fucking crap!
A year ago I was in Moscow. After 6 days without internet I really wanted to check my e-mail(webmail).
That day we spent some time some kilometers outside Moscow, but still managed to find a internetcafe.
After waiting for 15 minutes (the place was crowded) I started "surfing".
Man that was slow.
25 computers, *sharing* a 64kb uplink. And all the locals (they had an arangement; pay x numbers of rubels and "surf" as long as you want) where downloading with IRC, Kazaa, DC and ftp which resulted in *heavy* packet loss.
I spent 8 minutes getting the Yahoo.com frontpage. And it took me almost 20 minutes before I could read the first mail.
Melius mori in libertate quam vivere in servitute.
This sounds as dreadfull as my first experience with programming at school. Punch a set of cards, mail them to the data center, have them mail back the printout, or more likely, the error message.
(Then, I would go home and play with my Altair)
I suppose it's better than nothing, but not by much. I would just use the library.
First, for those of you saying 'Google is fast enough even on a 14.4K' - think school with one phone line, perhaps not even available during the day. Or how about connections via satelite phone at $$/min? Suddenly you want super efficient, when you only earn 5 bucks a day.
As to what else this needs, the search engine needs to strip out all the crap before emailing a web page to you (Java, Flash, etc) - should focus on mostly text, small pictures only. Particulary since 486's would be a common platform for people using this, so the search engine better work well on one. You also should be able to strip out all pictures as an option to maximise text info download - remember turning off pictures in Netscape 2.x to speed up your browsing? If you need something it striped out, you should be able to query just for the bits you need later.
Also the ability to share your cache between computers would be huge if they can't have a server to do that for them. At any rate, means of transferring those precious pages you downloaded to another computer - on a floppy, unless you have local email.
W9x:Thanks for the make-work project Bill.
How are these people connecting to the internet? By mail? Google is fucking text how fucking long does it take to load, even if you're using IE? Fuck!
> I think_you_need to GAFL
Go around flicking lemons? You think that'd help, huh?
Since the program is too big to download over a poor net connection it will be mailed on CDs to libraries for people to borrow and install.
When will our library system adopt this policy? There are quite a few programs out there that I would like to "borrow and install"...
Google responses are so slow that you have to email them -- but then clicking on the links in the email ISN'T to slow? Pretty half-fast solution, isn't it?
"Freedom means freedom for everybody" -- Dick Cheney
Back in 1998 I used to use Juno for email and had no web access. There was a free service that you could use to get web pages via email that I often used to "surf the web". the way it worked was you would send an email to webmail@curia.ucc.ie (I dont think it works now) and put "GO http://websiteaddress.com" in the body and it would email you back the html of that page.
You could search yahoo by requesting a url like http://search.yahoo.com/bin/search?p=search+terms
"the fax machine is nothing but a waffle iron with a phone attached to it." - Grandpa Simpson
Google is obviously very light. If you really can't support the bandwidth for graphics, use lynx or turn off images totally.
I realize that developing countries don't have fast connections, but google works fine across 24k connections (and really ought to work fine on anything).
email requests and getting responses later seems like a bit more of a kludge than it needs to be.
bcl
Remember Lexington Green!
I can imagine your brainwaves are somewhere along the lines of:
[_______________________________]
but I'll bite anyway. I still don't think you read the article or you would have noticed the part about compression.
"Someone using the software would e-mail a query to a central server in Boston. The program would search the net, choose the most suitable webpages, compress them and e-mail the results a day later."
Its very likely, that since the target is to use this for information, that the pages would be _highly_ compressed, either reducing image quality or removing many images altogether.
On top of all that, if you had read the article, you would have noticed the part about schools not having net access round-the-clock. This is because the entire joint only has one phone line. Given this limitation, it is far more feasable to have 20 students submit thier search to the teacher and have him submit it that night. Rather than dial up for 30 minutes and have 20 people sharing a dialup line try to search all at the same time, which would cause many of the queries to time out.
" And exactly how slow are these connections anyway? " Again, something you would have found out if you had read the article. Maybe you should go read it again.
So they can use it to download pr0n during off-peak hours!
I liked this technology when it came the first time around. Archie via email? Anyone? Yay! MIT reinvented Archie! Only with a thick client instead of a small one! Way to go!
So what if it scans webpages instead of FTP sites. It's not that big of a leap.
"Love heals scars love left." -- Henry Rollins
I don't understand. E-mail gateways to Gopher and the WWW have been around for a very long time now. What is so special about this? Perhaps the article is incomplete.
In anycase. There are still many applications for Gopher. The factors that made Gopher so appealing 10 years ago are still real-world factors today amongst third world countries - and cell phone users.
In the early days of the WWW a lot of effort was spent making sure Gopherspace was available to WWW users. It sounds like the opposite needs to happen.
E-mail is a tempting way to solve this problem - but what are you going to do? Send HTML pages via email? What does that save? Not to mention the 24 hour wait.
//iacovou
First, If you don't have enough bandwidth for even the most minimialistic google page with stripped graphics etc., you don't have enough bandwidth to view the sites attached to the results of your query. What good is a list of results by e-mail when you can't view the links to establish how accurate they are?
Also, the search process for the Internet is not only related to the quality of the search engine you use. It's an interactive process of finding what works best, refining your query based on results until you eventually find what you're looking for -- using this system would eliminate this aspect, at minimum making it painfully slow.
Can anyone give a useful application that many people could use and benefit from?
"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
They should develop a program that strips images, animation, java aplets, ActiveX components and all HTML from Web pages, leaving only the text and the links. Then send it to the users. It could be called Gopher. Or Archie.
Google is mostly text. Pretty low bandwith if you ask me. Plus, it works great with Lynx.
Being that I am now stuck on a 56K Modem... this really wouldn't be bad IF I could pick software to have mailed to me. If I was able to get the new Open Office, Mozilla and Redhat updates for a low fee it might be helpful.
Having webpages mailed to me seems stupid because I have high-speed internet at work and if there is a bandwith intensive site I just load it at work the next day...
cor... this sounds oh so familiar... anyone remember ftp by email??? History repeating itself...
Query the ftp server by email and get the directory list emailed back to you. Then you could send the command via another email which would result in the file being emailed back to you overnight ready for you to retrieve it.
And then there was "trickle" where files could be sent/refreshed to your uni's mainframe's ftp server overnight and would be there for you to play with the next morning and you would always have the most recent version of the file as they'd have been synched via trickle
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Why make a client-side program so big that it has to mailed? Why make a client-side program at all?
Wouldn't it make more sense (and be much simpler) for MIT to write a program to run on their end that would receive and read the email containing the sender's query and simply reply with the results?
It seems kind of Rube Goldbergian to go about it they way they planned.
It's kind of funny to see your post at this time, i just spent half an hour waiting for charter to go back up, spent about 15 minutes before that trying to fix it myself before realizing the problem wasn't at my end.
"MIT Reinvents Archie service from the early 90's."
"Given the pace of technology, I propose we leave math to the machines and go play outside." -- Calvin
This doesn't make any sense to me. I'm on 28.8, and 20 results from Google still come up instantly. Bandwidth might be an issue for the linked pages, but certainly not the search results. Even when I was on 14.4, back when Yahoo! was the hot search engine, it was no problem.
So, what if these guys are on 300 baud and they get compressed search results via... e-mail??? The delay waiting for results to navigate e-mail systems probably negates the savings from the compression. Why not send compressed results over HTTP using a web-browser like application? Of course then you are still faced with bandwidth issues on the links you follow.
It just doesn't make sense to me, unless they write a server-side proxy that intelligently filters Flash, popups, Java, superfluous graphics, audio, and other useless stuff that "web designers" like to use. The proxy could present pages in such a way as to offer users the option of downloading blocked files when the AI fails. That just cries out for a Mozilla mod or some other kind of custom browser; certainly not an e-mail client.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Gastrocolic reflex: This is the reflex system that tells the colon to empty when food hits the stomach, or even in anticipation of a meal. This is why baby poops every time he nurses. It is also why kids with constipation complain of their belly aches right around mealtime.
Do you think I spent 8 minutes downloading the images on the yahoo frontpage?
Stupid AC...
Melius mori in libertate quam vivere in servitute.
Sheesh, I masturbated to pr0n 8 times before Yahoo.com loaded for you.
Oh yes:
Imagine a beowulf cluster of these!!!
All you get back is the search results (links), not the actual material. You still have to download that.
It's a shame that with the way the net is going all they will get as search results will be flash heavy sites that take 20 minutes to download on broadband, let along dial up.
.tar.gz for download and offline reading.
Where did all the sites go that you could use wget -r to grab overnight? How about the odd few that used to offer a
Content over presentation is a concept that needs to be reintroduced to the net, preferably with a stick.
Beep beep.
Disable graphics and google loads in no time flat. Realistically, if you can't use google with your existing tools then you can't use any links a search engine would get you.
They are also considering trying to persuade computer sellers in developing countries to install the program on machines."
Hell with that, if any software should be pre-installed, it should be stuff the bulk of the customers are asking for, not something the developer persuaded the sellers to install. We all know the problems that kind of thing causes, and such bloat would be worse here, since the program is clearly called large and systems for developing countries are likely to cut corners and have small disks.
If they really want to do something useful, they should build an e-mail based portal. That way someone with e-mail could get anything they want, not just a search, but then submit a link from the search and have the result e-mailed back. Or submit the link for the program (this or any program) and have it e-mailed to you, that way it wouldn't need to be pre-installed and could be easily updated. There are actually still people who have more access to e-mail than to the Web, such an e-mail portal would be of good use, and would better address what this project claims to be trying to address.
I'm an American. I love this country and the freedoms that we used to have.
It sounds like this software could help sites from being slashdotted.....
Looking for a job?
Want your resume written professionally?
DON'T USE TUNAREZ!!!
MIT never ceases to amaze me. Their lack of innovation is almost as staggering as the Patent Office. How's the AI coming guys? No seriously, who would honestly pay for this shit? They have way too much money and time on their hands. Do something that's never been done - go study at Berkeley or something. Bunch of over-paid, over-hyped, over-inundated, over-inflated ivory tower loving idiots.
Why is it that everytime someone at MIT beats-off it makes the front page?
I remember when they prototyped hardware so you could surf the net wearing a head-rig with LCD glasses. Who the fuck cares? For the last time, we're not impressed with MIT anymore!
I wonder in what it is different from AGORA, Web-To-Email, Gopher, and such services services? If you dont know bout them, you might want to check the Accessing The Internet By E-mail -- Guide to Offline Internet Access and Fravia's "How to search the web" lesson 10.
Have fun.
-- search the web
Wouldn't it be much cheaper and easier to get those 5 people still using 56k DSL or am I missing something?
M.D. Inc.
WTF are you doing now that takes up so much of your time? Is it more important than possibly creating the greatest compression scheme yet and reaping the benefits?
I can imagine your brainwaves are somewhere along the lines of:
[_______________________________]
Ok, you are obviously someone incapable of making a point without resorting to childish insults. In my experience this usually correlates well with inferior mental capacity so I would encourage you to read the below slowly:
Downloading the contents of 20 pages when one page is the one you're looking for is vastly inefficient.
Its very likely, that since the target is to use this for information, that the pages would be _highly_ compressed, either reducing image quality or removing many images altogether.
Ooh! High compression achieved by not downloading fancy images or code! That's absolute genius. We can only download the text and then the user can choose if they want a particular image to be downloaded. Someone should really inform the makers of lynx so they can put this feature in the next version. Maybe the makers of IE and Netscape can have an option where they don't download images by default.
And since you seem to be particularly dense, I will have to point out that the above paragraph is intended to be sarcasm since that functionality already exists.
Mmmm.. Donuts
If your connection is too slow or unstable to handle Google, how are you going to load the web page your search returns??
One hardly needs a search engine to find a slower net. My first ISP, Concentric, certainly had a slower net. They are either gone now or hiding behind a different name, but you can still get a slower net from some providers. AOL users seem amazed when they see other systems using the same modems.
I'm an American. I love this country and the freedoms that we used to have.
we'll e-mail you a CD daily with your selections that you make via phone
Maybe it should have an option to charge a few bucks and airmail you a CD-ROM's worth of the most relevent results....?
sadly, i had an idea to categorize sites by the dewey system back in the day. of course, once i realized what a "portal" was, that idea died a quick death, as books only have one dewey decimal number.
:>
the other thing that killed it was that the dewey system only treats of non-fiction and as we all know, most sites are definitely fiction.
ed
What utter, socialist crap. The internet is closing the information divide.
I tend to think too, that until they would better spend their effort trying to get reasonable speed access.
Nice idea, free, useful, wish I still had my blackberry.
Saving random seed...
For me, I usually go through several revisions of my Google (or other) searches before I either hit on what I'm looking for or realize that it's not out there.
If I had to wait a day for each search query to come back, it'd take me a few weeks to do what I can accomplish in 10 minutes. Yeah, I know it's better than nothing, but "fine tuning" your query is a big part of what makes a search useful.
IAAL
It's right here.
This sig no verb.
Where else would you hide the Spyware.
That's cool how by sending the results via email you don't actually have to download the results (what?). Oh wait why would you ever do anything this way if you have Google?!
It's not like you'll be able to find your searches among all the "Sponsored email" you'll be getting. But hey these things take money.
I mean... it seems like you could whip this out in a 10-20k program (tops).
You quitting proves that the karma kap worked. The most annoying of the whores shut up. --CmdrTaco
The Internet Oracle always provides the best answer(s) to your questions.
Karma: Food Fight (Mostly affected by Date Plate).
Speak for yourself! I was never impressed with MIT.
Well, google.com is pretty fast even for slow connections.
For *even slower* connections (say wap over a 9600 baud connection), people can always use http://wap.google.com/wml which is an even lighter version of the already-light google!
Its primarily meant for wap-enabled phones etc, but you can search the entire web (and this is the default option) so there is shouldnt be a big difference to standard google.
...NOT to reply to jokes seriously (this guy actually spent time to get a link!)
So as just about every browser supports it, whats wrong with it ?
seems like we have a solution looking for a problem again
I have a 1.5 Mbps up / 768 Kbps up redundant connection with fiberoptic guy coming to hook me up to the Public Utility District network. Should I request such a CD from MIT, and would it help me?
IBM toaster, very expensive only 5 or so exist in the world. Users submit bread in batch for over-night toasting.
Anyone here remember how ListServ used to do something like this? Complete with a report on how much CPU was consumed in the search.
Actually sounds like a "neat thing" for people with laptops and such, when you're going to be disconnected for a time.
Email is perhaps one of the worst transfer system for binary data, why not use HTTP? post search (select email, jabber ID, ???) -> query ID is emailed/IM'd/etc.. to them, using supplied ID, they download the result.
If binary data is actually emailed... hmm.. think of the possibilities for sending "gifts" to your err "friends" email boxes.
You could submit searches from your wireless device and download them later, when you have bandwidth.
You could share the search result with other people, instead of saying "google for XXX" just point them at a result you've already searched on, saving the CPU of the search engine.
I think the only reason for such a program would because the compression used by the modem isn't good enough, but it is an incredibly inefficent means for a small improvement in compression. I suspect that by the time you finished adding the overhead of email, that you wouldn't really get that much more speed. I mean, really, how much more room are you going to be able to squeeze out of MP3s and JPGs.
I guess you could optimize your time a little better if you had a program that downloaded all of the pages from a particular search request. You could then view all of the downloaded pages on your local machine quickly. However such programs would create a ton of white noise, and would wreak havoc with all dynamic sites.
This sounds more like a technology without a cause.
As the article states, the plan is to recieve the search query, bring up the most suitable pages, compress them, and mail them back. I'd assume that in general a graphicaly heavy or plugin-dependent page wouldn't be deemed 'suitable', so they'd just be receiving a zip of text pages - hardly a great burden on the line and, at the end of the day, a real improvement for, say, schools. This is technological development at the opposite end of the scale from pop-under web ads; great work, and good work.
If on the other hand you download the actual contents of the top 20 pages then given how slow your connection is supposed to be, I don't see how you could do that in "only a few minutes".
You download a compressed version of the contents of the results of the search. HTML pages compress very, very well, so I'd hazard a guess that it's pretty efficient.
Go read the article. It explains a lot.
Presumably the slow user would have to dial up and download their compressed pages at some point, whether or not it's an attachment. Shouldn't the user be able to download the compressed files using a protocol designed for file transfer?
Build you own using Google Web APIs, such done by CapeScience :
"Just email google@capeclear.com and put the text of your query in the "Subject" line. You'll receive your search results via email."
Search the web by mail! have you're results in 4-6 weeks!
I write code.
>> e-mail queries to a central server
"Three days after its launch, the central server became overloaded with queries for weight loss, porn, and refinancing. Sadly it became less than responsive to even those with 14.4 dialup."
This FAQ explains how to access most of the internet using only a standard email client.
The above document explain how to access:
FTP
ARCHIE (deprecated)
FTPSEARCH (deprecated)
GOPHER (deprecated)
VERONICA (deprecated)
JUGHEAD (deprecated)
USENET
WWW
WWW SEARCH (using standard search engine like altavista, yahoo or google)
FINGER
WHOIS
[...]
All these protocols can be accessed via email, according to the FAQ. The FAQ has been around for a long time. This explains why many (most) involved protocols are now deprecated. I used this faq in the early '90 and I don't know how it works now. At the time, it was great. The last update is 2002/04/16.
I guess I should first mention the obvious that putting a bunch of other people's copyrighted work on a CD Rom, is the type of thing that gets people hauled before courts.
As for compression, if you are using compression on the modem connection, then you don't really save any time by trying to compress the data again. You might save some space, but my experience is that a zip file full of jpgs isn't that much smaller than a directory of compressed images.
The one area where the program has benefits is in time management. A program like Lotus Notes that replicates data sets over slow connections lets you use idle time on a network for data transfer.
You don't need to develop new technology here. Just find all the technology that was used to replicate data when phone connections ruled corporate America.
I could see some benefit to in having a better client side database for web pages. I can't see how the email layer in the application is making any contribution to the effort.
Unfortunately, to be frank, this particular project sounds more like useless posturing from self righteous grant writers who are wanting to get taxpayers money by simply claiming to be doing something for the poor.
The World Bank (et. al) have been very good about funding projects that make claims that they will help people, that simply enrich the grant writers.
It sends you the actual compressed pages, you don't have to click the links.
Why do they need a program for this? Of course, it automatically unpacks stuff from the email, etc, but that's not something that you absolutely need.
I mean, there are already http-over-email services, that do not need any special program, you just send a mail to the service's address with the links to the stuff you want and it sends it back and you could use it perfectly this way: just send a mail with a line like 'hedgehog asia' to mail@search.com and it would send you back a reply with all the pages relating to asian hedgehogs.
The hassle with sending the CDs seems somewhat unnecessary (and it would require a CD drive and most probably MS Windows).
Real life is overrated.
As admirable as the idea behind this project is, I don't think it'll succeed. In a word: money. The programming and research aren't the problem -- someone's getting a thesis out of this, so MIT'll foot the bill. The problem comes with finding money for maintaining and improving the servers, handling abuse, support, etc.
It's a service that's only useful for poor third-world schools. Those organizations are probably running on a donated 486. They sure don't have money to pay, or even the money to pay to download ads. Charity-wise, "fund a search engine for poor third-worlders" is somewhat less compelling than "feed a starving child".
I see this idea living on research and enthusiasm for a year or two then dying a quiet, broke death.
Forward, retransmit, or republish anything I say here. Just don't misquote me.
There are a Lot Of Sites which the lords of Google have cut from their search engine. And I'm not talking porn. (The proliferation of sex-obsession is actually encouraged due to its weakening effect upon individuals.) I'm talking about anything which holds any real weight and thereby pisses off the wrong people. --Or anybody who needs to be punished, are chopped from what has become the Internet's de-facto public eye. (Google.)
Easily done, too. Make a remarkable product. (Google was amazing when it began). Then corporatize it and hand over the strings to the bad men.
This isn't a joke. There are plenty of examples. My own websites are among them; won't list them here, because I can't afford the backlash of losing anonymity, but on the 'lesser' search engines they show up fine and dandy at the tops of the lists. The same was true on Google, until I crossed a couple of lines back when Iraq was taken by Bush and his gang of psychopaths. Google has since chopped ALL my sites despite the fact that they have nothing to do with my personal political views and various beliefs about how reality works.
There IS a war be waged to suppress human awareness. Those of us who have the balls to try to lift the curtain for others to see DO take damage.
Anyway, people should use All The Web if they want a search engine with teeth.
-FL
Even if what you want is among those 10, you had to tie up that phone line for at least 5x as long.
Warning: Opinions known to be heavily biased.
...has an example of an SMTP-to-HTTP bridge.
Google and Amazon have published web APIs.
I see instant low-speed connectivity to two of the web's two big consumer apps.
668: Neighbour of the Beast
Oops, sorry I was thrown by the last header "CDs in libraries." I thought the article said that they were putting the result sets in the library...not just the program. Mechanized ways of publishing result sets gets people in trouble.
The article still seems fishy because a great deal of the TCP/IP infrastructure has been optimized to handle the problems of slow connections, precompressing data isn't more efficient than compression in transit. There are as many IP collisions when sending e-mail as with http requests.
Having a program that crawls the web and sends a lot of pages for a search engine request will increase, not decrease download times, but I misread the bottom section and apologize.
To "redeem myself," I'd like to make two points:
1. I was aiming for amusing with the Google thing. I decided to tack on the "real question" because I'm honest about my ignorance of the topic.
2. In what way will this search function highlight the control of relevance algorithms over the kind of knowledge folks using this search process will get? In a higher bandwidth society, I have the freedom to check out numerous searches and continually refine my search strings to find the best information. Folks using this service, however, will not be able to do so as readily.
3. I lied, third point. Ultimately, this just continues to reinforce the hierarchy of post-industrial nations over developing ones by giving them a quick fix for a dearth of wealth in the Information Age. For anyone to compete globally in knowledge or business, they have to have substantial information in a timely fashion. This provides neither, and while an admirable stopgap measure, fails to address the root problem.
Under capitalism man exploits man. Under communism it's the other way around.
What's next? TCP/IP over carrier pigeons?
+++ATH0
NO CARRIER
First a search engine for slow networks, now robotic snails http://www.theregister.co.uk/content/28/31783.html :-)
I'm curious about what the hard part of writing this program is. I'm a competent developer and I could put together a rough demo in a few days (times 2 for standard estimate buffering).
I'm sure there are good reasons, such as they are writing their own search engine tailored to this particular need, but using existing tools this could be cobbled together in short time.
We have a server. One program writes all emails sent to a certain email address to a certain directory. This program is probably already written and open source. We have another program running on the server, this one monitoring the directory. Each email contains a query which can be run against google using their provided API making SOAP calls. The results are returned and used to create a list of URLs. Each URL is then crawled to a set depth (other more complex factors could be substituted later), written to it's own directory (or stuffed in a db), including all image files, etc. We now have a set of directories correlated to a query.
Now we just have to get the results back to the user. A third program runs through each set of directories, it will parse the HTML to update links (including images, etc.) to how they will be seen locally by the end user, strip out unneeded stuff, perhaps replace the existing images with more compressed versions (there are plenty of existing libraries to help here), and then the whole set of directories can be compressed using a standard compression scheme such as gz or zip.
There's your demo. So what's the challenging part of improving this? Perhaps making sure the user gets the search results they want? I'm used to rapid fire searching, repeatedly zeroing in on what I'm looking for. That would take days for these users, so perhaps there is an intelligent agent in the client program that prompts the user in ways that would hopefully improve their results.
Perhaps the search engine component may rank pages differently than google based on information density or the correlation of information contained in the pages crawled from the original search result (so that one is more likely to get results where many pages talk in depth about the same subject).
Perhaps they are developing their own compression technique specific to this task. Since crawled pages will probably contain the same headers, nav, etc, they may put in a custom stage where special hooks and codes to represent that before going into a general compression stage. The client program would know how to assemble this and would usually achieve better compression than using a general approach alone.
I'd like to see how what MIT is working differs from the rather rough demo I've sketched out here.
Perhaps these folks would also like to get involved with this project.
We've (http://www.vh.org/) been approached by the widernet group to provide some content, issues such as updating the information and copyright abuse crop up right away.
Long term, I think it's a better idea to focus on getting better connectivity then band aiding the problem with solutions such as these. Notice no one addresses the langauge barrier problem, I guess all interested parties are assumed to speak English?
Tough problem, no easy solution.
Anything is possible given time and money.
Really this idea sucks.. using a 1kb dialup modem
I can get results from google, etc..
And the 'too big to download' issue.. hahaha.. bwhahah
Fools
you might remember agora and getweb servers. Web pages by email, formatted via lynx. _Very_ cool for those who may have only had a mail connection (either a BBS->net, or, like one place I was at for a while, UUCP for mail downloading).
-- Is "Sig" copyrighted by www.sig.com?
Big deal, this is barely a step up from this: http://www.capescience.com/google/
Sounds like the project must be a bunch of IBMers who are nostagic for ye good olde days of batch. Perhaps they will also create special perforated cards for submitting these searches.
Two wrongs don't make a right, but three lefts do.
Um, this is what Yahoo! is. Perhaphs you've heard of it? Also, what is Google Directory if not "an ever changing directory"?
You are correct, email handles long latency and disconnected lines better the HTTP. What happened in the US is that companies made a tremendous effort to invest in technologies to handle the latency problem, but faster internet came around before there was a postive return on the investment.
The third world might benefit from using database replication technologies.
Ultimately, however, the real gains will come from building the true infrastructure in the third world. For example, an area with bad telephone service would do better to start with higher speed wireless networks.
As I used a modem for years and had no problem with regular search engines.. What do they mean by "slow" connections? 9600bps? Carrier pigeon?
Dear Google,
Can you please me the Google Image Search for kittens please?
Sincerely,
Computer Less
Hi, I'm a Ph.D. student working on the TEK Project. TEK does send the content of pages, not just links (although it also allows you to retrieve individual links, if desired). This allows you to get information back in a single query. TEK stores all returned results in a local cache on the client machine, so that users can search through the pages and refer to them at a later date. The software provides a local search utility that allows you to peruse previous results with a standard web browser; you do not need to keep the emails that are returned from the TEK Server. We hope that this is useful not just for taking a snapshot of a given page, but also for averting future searches if some content has already been downloaded before. More details are available on the TEK website: http://tek.sourceforge.net/
http://tek.sourceforge.net
We are also in the process of migrating our CVS source tree to SourceForge.
In fact, TEK does send the content of the pages it finds. In this sense, "search engine" is a bit of a misnomer -- it's more of an "information delivery tool".
Many more details about TEK are available from the TEK Homepage
Write a PHP script to do a query to Google, strip out the images, gzip the content, and feed it to a user.
Sure, gzip may not be as good as these compressed emails, but hey, gzip gets a google search page (for "linux") down from 23,859 bytes to only 5,724 bytes.
Now, even a 14.4k modem would download that in (theoretically) 3.2 seconds. They don't get much slower than 14.4k modems. But even if you insisted on a 300 baud modem, that's still only two and a half minutes to download!
Now, no matter how slow the connection, I would bet dollars to donuts it's faster than 300 baud even in a third world country. In other words, standard gzipped webpages are good enough, no need for silly programs distributed on CDs. (This is besides the fact that I can't figure out why a simple 4KB program to send/receive an email would require an entire CD)
It would make alot more sense to you if your read the article.
What these guys are doing is actually pretty cool and makes alot of sense. Here, I'll walk you through it:
Go get 'em Bob!
Dude! Get a faster modem. A 56k modem is like, what, $15?
It is your personal duty to fight for what is right on a daily basis. Ignoring injustice is identical to approving
I am a graduate student working on the TEK project, and we have never received funding directed for TEK. So far, the project has been carried out using general research funds (for example, a faculty startup package) available to the PI. We have been operating with a very low budget, mainly with undergraduate students. One of the researchers, Libby Levison, worked on TEK for an entire year without receiving any pay. Most of us also work on unrelated projects that are funded separately.
As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar. For the record, we submitted a proposal to the NSF ITR program that covers TEK, but the proposal was rejected.
But it's good to look at other things including using lynx or another text only browser.
Dude! Get a faster modem. A 56k modem is like, what, $15? There are a lot of places with shitty phone lines where you'd be lucky to get 28.8 no matter how fast your traditional modem.
-- 'As it all washes away you know -- as it all is one, no one is alone.' -Cosmic Disorder
Q: Who are the intended users, and how slow is their connection?
A: The primary targets are communities where Internet access is expensive, unreliable, or completely unavailable. In developing nations, an email account is often significantly cheaper than full-fledged web access; for a few examples, see our last paper. Moreover, there are many cases where connectivity is intermittent, and it is cheaper and more reliable to send files in a batch mode during off-peak hours. Regardless of the modem speed, users in developing regions are often plagued by long latencies and low bandwidths due to congested infrastructure and inter-continental links. Many such users have expressed a lot of excitement about the TEK system.
Q: But your server takes 24 hours to reply. How will that speed things up?
A: Actually, our server replies immediately to each query, and processing takes less than a minute. The one-day wait in the article is just an example scenario that accounts for possible delays in the local network, as well as the night-time usage model.
Q: Still, how does this make web access more affordable?
A: The TEK system shortens the expensive connection time because it makes browsing an offline process. A set of pages can be downloaded from a local ISP during the cheapest and most reliable hours; users never have to pay for online time spent reading pages or waiting out inter-continental communication latencies. Moreover, the client-side cache of downloaded pages and the intelligent server processing could eliminate some searches altogether (see below).
Q: Google is fast, low-bandwidth, and even has an email interface. What's new here?
A: The TEK system is not really a "search engine"; rather, it is an end-to-end information retrieval tool with both a client and a server. In fact, the TEK Server queries Google for its candidate pages. The value added by the server is that it keeps track of the pages sent to each user, and avoids sending duplicate pages in future search results (unless, of course, a user requests an updated version of a page.) This ensures that the client's bandwidth will be used only to download material that is new and interesting. Note that the server also sends the actual content of pages rather than just a list of links; it does some basic filtering and compression of the content to reduce the bandwidth requirements.
Q: Why do you need a program on the client side?
A: The TEK Client is a very important component of the system. It provides a web proxy that simulates an Internet connection so that users can view downloaded pages in their favorite browser. In addition, the proxy stores all pages in a local cache so that they can be searched and viewed at a later time. It also provides basic user management and query tracking so that many people can share a common machine and email account, perhaps on a public kiosk or school computer.
Q: Why is the client program so big as to require a CD?
A: The program itself is relatively small; the JAR file is 125 KB. When we add in third-party libraries and the installer package, the size is up to 2 MB. Including Java in the installer bumps the size to 10 MB. We implemented the first version of the TEK Client in Java for portability and ease of development, though we agree that a more compact distribution is possible, and we could be interested in exploring this in the future.
Q: Do you intend this as a permanent solution for low-connectivity areas?
A: No. In the long term, there needs to be
It's actually possible to access almost everything through email...
s s- via-email/
http://www.faqs.org/faqs/internet-services/acce
That was a 1990 RFC. It was updated in 1999. The more current "IP over Avian Carriers with Quality of Service" RFC can be found here:
ftp://ftp.rfc-editor.org/in-notes/rfc2549.txt
Pot Kettle Black.
The good old days :)
Q: Do you intend this as a permanent solution for low-connectivity areas?
A: No. In the long term, there needs to be better communications infrastructure in developing regions. This system provides an interim solution for delivering much-needed information. It also serves as a stepping-stone to full connectivity, as it simulates a web connection from the client machine. Once the infrastructure is available, many rural users will have developed familiarity with browsers, web pages, and search engines.
Q: How is the project funded?
A: We have never received funding directed for TEK. The project has been carried out on a low budget, using mostly undergraduates and funded by general research funds (e.g., a faculty startup package). One of the researchers, Libby Levison, worked on the project for an entire year without receiving any pay. As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar.
Q: Do you need any help with the project?
A: Yes! We are currently moving our CVS source tree to SourceForge, where we welcome open-source developers. We can also use help deploying the software in low-connectivity communities -- please find our contact information on our website. Thank you.
If I understand they invented this :
/tmp/; mkdir dl; cd dl; wget http://www.google.com; cd ..; tar cvzf file.tar.gz dl/*; cat file.tar.gz | uuencode file.tar.gz | mail -s 'Your search' my@mail.domain'
;-)
ssh remoteserver 'cd
Tune wget URL, recursivity and you are ok
***Think*** I will patent this beautiful thing !
Is this not just another HTTP over UUCP?
It used to be quite common to surf in this way, or use a similar technique to request a file from an FTP server which would then mail it to you, the rationale being that you were likely to be able to get a faster connection onto your mailserver than onto the hosting server.
coldcity
code, life, art
Email is the one thing which is fast as when downloading an email you are accesing a local server which operates at the full speed of your connection. So email searches [and or content via email] is a brilliant idea. Pakistan's entire bandwidth is less than a typical college lab's in one of the better uni's in the US, and till that gets better things aren't going to get better soon. The internet divide is very much here, and the more u can afford the faster u can browse, with the majority not being able to afford any access.