A Search Engine For The Slower Net
Makarand writes "According to this BBC News
article researchers at MIT
are developing a search engine for people
using the web on slower net connections.
The software will e-mail queries to a central server and receive the most relevant
webpages from the search results by e-mail in a compressed form. Since the program is too big to download over a poor net connection
it will be mailed on CDs to libraries for people to borrow and install. They are also considering trying to persuade computer sellers
in developing countries to install the program on machines."
About them Modem Linkers,
ain't they kinda odd?
Goin' on the net,
with they little baud.
Look at all those Modem Linkers,
what a thing to see.
Web sites come up really slow,
get's lousy Voice/IP.
Internet at low bit rates,
what a dawgon mess.
Load a web site, take a break,
while 'pache mods compress.
How to be a Modem Linker,
don't need a ticket.
Get a local ISP,
dial up and link it.
A programmer is a machine for converting coffee into code.
I know the Internet is complicated - but there's no need to pick on slow people.
Maybe we could have all webpages categorized by a number, something like 800 for science or whatever, and then we could have a filing cabinet with index cards in it. Then, people could open the filing cabinet, see a number for the page they want and then go directly to the page.
...still surfing the internet with their Commodore 64s and 300 baud modems!
http://www.mshiltonj.com/sr/
I believe it is the pages themselves, not Google, that this is an attempt to deliver.
Might be a nice way to preserve searches for later perusal. Unlike bookmarking, the returned search results are stored in an email.
This would be a good way to preserve stuff that may be the subject of removal due to court order, like xenu.net and other similar de-Googlings.
MIT guys! Why don't you put your brain into better compression technology? So we can deliver higher bandwidth to those still on crappy 56K lines?
And don't say it isn't doable... If I had the time, I could do it, and I'm a mere highschool graduate...
---
Programming is like sex... Make one mistake and support it the rest of your life.
I had the same initial reaction, but after RTFA (I know, shame on me), it seems that the limitation isn't so much time, but continuous time hogging the phone line accessing Google, checking out pages, etc.
Instead, this service would package together selected results of the search, for overnight download into the PC's cache. The user can then browse through the material at their leisure without needing to use the internet connection (which is the scarce resource).
Stop by my site where I write about ERP systems & more
for my cable Internet connection at home.
Yes, I am dead serious... Lets just say Charter's cable Internet in my area lately really stinks. I would almost rather be on a 14.4k modem - no joke. I am not the only user... I get lag spikes of over 3000ms when not doing anything, and almost dropped connections. Good thing DSL recently became available in my area =D. One less Charter Pipeline subscriber.
They are developing the program which will replace web forums - you post a message to predetermined mail account and everybody subscribed will receive it very soon (patent pending).
File transfers and weather forecasts are planned in 2006.
This will make a difference.
- Arwen, I'm your father, Agent Smith.
- Well, you're just Smith, but my father is Aerosmith!
For those of you wondering why someone would do this, how about reading the damn article?
The program doesn't e-mail back with a mere mirror of a google / yahoo results page. It actually filters through the individual results compressing the entire page. e.g. my search turns up a CNN page and a blurb on MSNBC and I get, e-mailed to me, compressed versions of those actual sites, not just links to them.
As far the "my 28.8 modem is just fast enough" crowd -- read the article! Some of these locations the software is being developed for don't even have access to a phone line on a regular basis. And the lines they do have access to are more likely than not to be noisy as hell and not able to support a 28.8 connection.
TODAY
I am reminded of the Prepaid Legal system of doing business. You call up and ask a question, and the next day, an attorney familiar with the area you are asking about calls you back to answer your questions and advise you. So maybe this isn't all that outdated of an idea after
IN REGARD TO THE SYSTEM IN THE ARTICLE:
To have this capability back in 1973 would have been unbelievable. In 1983, to have this available to every library in the US would have been an unbelievable achievement. To have it now is so slow that I start to go google eyed even thinking about it.
BUT
This is great for countries that are 20-30 years behind in technology. It will revolutionize the search for information for areas that are not as connected as the US.
is not slow connections, but connections that are unreliable
Using the phone in a country like Malawi can be a real adventure. It's not like the US at all.
So why is it that the answer to all of my searches is either "wet teens," "Generic Viagra," or "I am a banker from Nigeria?"
* Please do not read my signature.
Coincidentally (?) it is also very usefult to circumvent the Great Firewall. Way to go, but it would also be nice to optionally have the cached content (ala google) e-mailed as well. That would send the last standing wall crumbling.
Code poet, espresso fiend, starter upper.
RTFA.
Why don't you just scream "HI I'M FROM 'WESTERN' CIVILIZATION AND HAVE NO IDEA HOW THESE THINGS WORK IN LESS PRIVLEGED PLACES"
Google is too slow when your school has one phone line that is used for _everything_, including net access. Not to mention the cost of using the phone anyway. This allows all the students to submit thier searches to a teacher one day, the teacher then submits the all searches with only a couple minutes of dialing up. He can retrieve the compressed results a few days later with only a minutes of dialing up. Now go read the article. Someone needs to mod that post down, hopefully the poster can redeem themselves later in the thread with something insightful.
And for those people with no internet connection, you can mail your search requests to MIT (Please include self-addressed stamped envelope). MIT will then process your search request within 5 business days, and mail you back the results. You can then peruse the results and marvel at the wealth at information you'd be able to find... if only you had internet access.
Shameless plug for my photos on Flickr
I tried to RTFA but MIT hasn't emailed it to me yet :(.
my blog
...only webdesigners had not collaborated to turn the web into the graphics orgy it is today. I mean, have these kids coming out of graphics school even browsed the relevant w3c specifications?
News Flash !
For those of you who want to try it out at home, just use one of your several hundred AOL CDs, and voila, you'll have a line slow enough to try it out.
To try out this demo, please follow these simple steps:
1. Pick up the phone and call the automated voice search system at (650) 318-0165.
2. After the prompt Say your Search Keywords, say your query to the system.
3. Click this link and a new window will open with your voice search results.
4. Say another query, and the new window with the search results will be updated with the new results.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
First, for those of you saying 'Google is fast enough even on a 14.4K' - think school with one phone line, perhaps not even available during the day. Or how about connections via satelite phone at $$/min? Suddenly you want super efficient, when you only earn 5 bucks a day.
As to what else this needs, the search engine needs to strip out all the crap before emailing a web page to you (Java, Flash, etc) - should focus on mostly text, small pictures only. Particulary since 486's would be a common platform for people using this, so the search engine better work well on one. You also should be able to strip out all pictures as an option to maximise text info download - remember turning off pictures in Netscape 2.x to speed up your browsing? If you need something it striped out, you should be able to query just for the bits you need later.
Also the ability to share your cache between computers would be huge if they can't have a server to do that for them. At any rate, means of transferring those precious pages you downloaded to another computer - on a floppy, unless you have local email.
W9x:Thanks for the make-work project Bill.
I liked this technology when it came the first time around. Archie via email? Anyone? Yay! MIT reinvented Archie! Only with a thick client instead of a small one! Way to go!
So what if it scans webpages instead of FTP sites. It's not that big of a leap.
"Love heals scars love left." -- Henry Rollins
They should develop a program that strips images, animation, java aplets, ActiveX components and all HTML from Web pages, leaving only the text and the links. Then send it to the users. It could be called Gopher. Or Archie.
cor... this sounds oh so familiar... anyone remember ftp by email??? History repeating itself...
Query the ftp server by email and get the directory list emailed back to you. Then you could send the command via another email which would result in the file being emailed back to you overnight ready for you to retrieve it.
And then there was "trickle" where files could be sent/refreshed to your uni's mainframe's ftp server overnight and would be there for you to play with the next morning and you would always have the most recent version of the file as they'd have been synched via trickle
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
"MIT Reinvents Archie service from the early 90's."
"Given the pace of technology, I propose we leave math to the machines and go play outside." -- Calvin
It's a shame that with the way the net is going all they will get as search results will be flash heavy sites that take 20 minutes to download on broadband, let along dial up.
.tar.gz for download and offline reading.
Where did all the sites go that you could use wget -r to grab overnight? How about the odd few that used to offer a
Content over presentation is a concept that needs to be reintroduced to the net, preferably with a stick.
Beep beep.
I wonder in what it is different from AGORA, Web-To-Email, Gopher, and such services services? If you dont know bout them, you might want to check the Accessing The Internet By E-mail -- Guide to Offline Internet Access and Fravia's "How to search the web" lesson 10.
Have fun.
-- search the web
I can imagine your brainwaves are somewhere along the lines of:
[_______________________________]
Ok, you are obviously someone incapable of making a point without resorting to childish insults. In my experience this usually correlates well with inferior mental capacity so I would encourage you to read the below slowly:
Downloading the contents of 20 pages when one page is the one you're looking for is vastly inefficient.
Its very likely, that since the target is to use this for information, that the pages would be _highly_ compressed, either reducing image quality or removing many images altogether.
Ooh! High compression achieved by not downloading fancy images or code! That's absolute genius. We can only download the text and then the user can choose if they want a particular image to be downloaded. Someone should really inform the makers of lynx so they can put this feature in the next version. Maybe the makers of IE and Netscape can have an option where they don't download images by default.
And since you seem to be particularly dense, I will have to point out that the above paragraph is intended to be sarcasm since that functionality already exists.
Mmmm.. Donuts
Yeah I thought the same thing, though this goes a step further and sends compressed copies of the resulting pages back you, not just an index of the sites.
What I wonder is why the *client* needs any software? Why not just make an email addy that people send queries to (like you did with "archie") and get the results back in whatever mailer you've got already?
Chris
This FAQ explains how to access most of the internet using only a standard email client.
The above document explain how to access:
FTP
ARCHIE (deprecated)
FTPSEARCH (deprecated)
GOPHER (deprecated)
VERONICA (deprecated)
JUGHEAD (deprecated)
USENET
WWW
WWW SEARCH (using standard search engine like altavista, yahoo or google)
FINGER
WHOIS
[...]
All these protocols can be accessed via email, according to the FAQ. The FAQ has been around for a long time. This explains why many (most) involved protocols are now deprecated. I used this faq in the early '90 and I don't know how it works now. At the time, it was great. The last update is 2002/04/16.
As admirable as the idea behind this project is, I don't think it'll succeed. In a word: money. The programming and research aren't the problem -- someone's getting a thesis out of this, so MIT'll foot the bill. The problem comes with finding money for maintaining and improving the servers, handling abuse, support, etc.
It's a service that's only useful for poor third-world schools. Those organizations are probably running on a donated 486. They sure don't have money to pay, or even the money to pay to download ads. Charity-wise, "fund a search engine for poor third-worlders" is somewhat less compelling than "feed a starving child".
I see this idea living on research and enthusiasm for a year or two then dying a quiet, broke death.
Forward, retransmit, or republish anything I say here. Just don't misquote me.
To "redeem myself," I'd like to make two points:
1. I was aiming for amusing with the Google thing. I decided to tack on the "real question" because I'm honest about my ignorance of the topic.
2. In what way will this search function highlight the control of relevance algorithms over the kind of knowledge folks using this search process will get? In a higher bandwidth society, I have the freedom to check out numerous searches and continually refine my search strings to find the best information. Folks using this service, however, will not be able to do so as readily.
3. I lied, third point. Ultimately, this just continues to reinforce the hierarchy of post-industrial nations over developing ones by giving them a quick fix for a dearth of wealth in the Information Age. For anyone to compete globally in knowledge or business, they have to have substantial information in a timely fashion. This provides neither, and while an admirable stopgap measure, fails to address the root problem.
Under capitalism man exploits man. Under communism it's the other way around.
Hi, I'm a Ph.D. student working on the TEK Project. TEK does send the content of pages, not just links (although it also allows you to retrieve individual links, if desired). This allows you to get information back in a single query. TEK stores all returned results in a local cache on the client machine, so that users can search through the pages and refer to them at a later date. The software provides a local search utility that allows you to peruse previous results with a standard web browser; you do not need to keep the emails that are returned from the TEK Server. We hope that this is useful not just for taking a snapshot of a given page, but also for averting future searches if some content has already been downloaded before. More details are available on the TEK website: http://tek.sourceforge.net/
There are several benefits of having a TEK Client program instead of just using email. But first off, the client isn't that big -- the JAR file with the TEK classes is 125 KB. When we package it up with third-party libraries and an installer, it comes to 2 MB, and with Java included, it's 10 MB. It would be interesting to try to prune down this distribution to the minimal size -- for the prototype version, we have focussed primarily on the software's functionality.
The TEK Client program is useful because it provides a seamless interface to browsing the downloaded pages. It operates as a web proxy: users adjust their browser to talk to TEK instead of the web, and then they can view pages just as if they were connected. The URL's appear as usual in the browser's "location" toolbar, and links on the page are functional. If a URL has been downloaded before, then it is loaded out of the local cache; if it has not yet been downloaded, then the user is queried to submit a request for that URL.
The TEK Client includes a local search utility for searching the cache of downloaded pages. In this way, the user can build up a local library of information that is relevant to their community; for example, in a school setting, many searches could be satisfied using only the local cache due to overlapping interests of students.
Also, the TEK Client is useful for tracking searches. In settings where connectivity is intermittent, searches can be enqueued during the day and sent at night (or when a connection is available.) The client also provides basic user management so that multiple people can share a public installation (perhaps using a single email address, which they might not own themselves) and still keep track of their own queries.
In the future, we think there are a lot of features that could be added to the client. For instance, we could seed the client with other open-source resources, such as an atlas or encyclopedia, that could be used in conjunction with web searches. There could also be an "intelligent query builder" that helps construct Internet searches (for example, by checking spelling) before going through the time and expense of connecting and sending them off.
Many more details about TEK are available from the TEK Homepage. We are currently moving our CVS source tree to SourceForge, so if you're interested in helping to improve the software, it'd be great to hear from you!
We agree that you won't have too much to gain from zipping the content before sending it. The larger gains are from higher-level compression; for instance, the TEK Server keeps track of each page that it sends a given user, and it is careful not to send duplicate pages in replies to future search queries (unless the user specifically requests an updated version of a given page.) This can be especially useful in shared environments (such as a school) where there is a lot of overlap between queries.
Also, there are some marginal gains to be made by zipping more content at once. The server sends ~20 pages at a time (or all the URL's requested in a given batch), which will compress better than if they were done separately.
Your point about the bloat from the mail program is a great one, thanks. We should look into fixing this.
By the way, we see the primary benefit of TEK as being the email-based access rather than the compression. You can find many more details about the project on the TEK Homepage.
I am a graduate student working on the TEK project, and we have never received funding directed for TEK. So far, the project has been carried out using general research funds (for example, a faculty startup package) available to the PI. We have been operating with a very low budget, mainly with undergraduate students. One of the researchers, Libby Levison, worked on TEK for an entire year without receiving any pay. Most of us also work on unrelated projects that are funded separately.
As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar. For the record, we submitted a proposal to the NSF ITR program that covers TEK, but the proposal was rejected.
You're right that retrieving web pages over email has already been done. A present-day service that works as you describe is www4mail, and I know people that use it regularly from low-connectivity regions.
However, the TEK system (which I'm involved in) offers several benefits over a purely email-based solution. By having a web proxy on the client side, users can use their favorite browser to view downloaded pages, complete with color and formatting, which is often absent in text-only systems. Moreover, the client keeps a local, searchable cache of all downloaded pages, and the server keeps track of which pages have been sent to avoid wasting bandwidth on duplicate content. Finally, with a web-like user interface, many users can share a single e-mail account in a public kiosk or school.
Many more details about the TEK system are available from the TEK Homepage