On my screen, the end of the first posting is cut off without a link. Here's the rest of it again, just in case:
Q: Do you intend this as a permanent solution for low-connectivity areas? A: No. In the long term, there needs to be better communications
infrastructure in developing regions. This system provides an interim
solution for delivering much-needed information. It also serves as a
stepping-stone to full connectivity, as it simulates a web connection
from the client machine. Once the infrastructure is available, many
rural users will have developed familiarity with browsers, web pages,
and search engines.
Q: How is the project funded? A: We have never received funding directed for TEK. The project
has been carried out on a low budget, using mostly undergraduates and
funded by general research funds (e.g., a faculty startup package).
One of the researchers, Libby Levison, worked on the project for an
entire year without receiving any pay. As a policy, we have never
applied for funding from any organization where we will be in
competition with developing nations for the same dollar.
Q: Do you need any help with the project? A: Yes! We are currently moving our CVS source tree to SourceForge,
where we welcome open-source developers. We can also use help
deploying the software in low-connectivity communities -- please find
our contact information on our website. Thank you.
As a graduate student working on the TEK project, I can try to answer some of your questions.
In terms of the current version of the software, you're right -- it's mostly a lot of engineering. To get things working, we put together known techniques and tools into a single package.
However, there are two big pieces that are missing in your description. One is the client, which is a web proxy that simulates a web connection and interacts with the user. It's about 10,000 lines of Java code, plus 5,000 lines of HTML. The other aspect is that the server actually keeps track of all the pages that have been sent to a given client, to avoid sending duplicate content in the future. For this, we use a database, which complicates parts of the server code; in all, the server comes out at 16,000 lines of Java right now.
So, I'd be impressed if you could write this in two days. However, if you're interested, we could use your help improving the software! We're currently moving our CVS source tree to SourceForge (under project "TEK").
In reality, the hardest problems have been dealing with OS-level issues. How do you send and receive mail in a general way? We're currently using SMTP to send, and attachments to receive. But this has some limitations... we experimented at length with the MAPI interface and other approaches, but always had some roadblocks.
Also, there are a number of "hard" research questions that we think you can consider once the basic infrastructure is in place. For example, what is the best procedure to select pages with a high "information density"? How does the selection procedure interact with the geographical and demographic characteristics of a given client? How can you gather as much information as possible at the client side to make the search successful? These are the kinds of things we'd like to consider eventually.
Many more details about the project are available on the TEK Homepage.
I am a graduate student working on the TEK project, and I would like
to clarify some of the issues raised in this discussion. Some brief
answers to frequently asked questions are below; for full details,
including publications and the software itself, please see the TEK Homepage
Q: Who are the intended users, and how slow is their connection? A: The primary targets are communities where Internet access is
expensive, unreliable, or completely unavailable. In developing
nations, an email account is often significantly cheaper than
full-fledged web access; for a few examples, see our last paper. Moreover,
there are many cases where connectivity is intermittent, and it is
cheaper and more reliable to send files in a batch mode during
off-peak hours. Regardless of the modem speed, users in developing
regions are often plagued by long latencies and low bandwidths due to
congested infrastructure and inter-continental links. Many such users
have expressed a lot of excitement about the TEK system.
Q: But your server takes 24 hours to reply. How will that speed
things up? A: Actually, our server replies immediately to each query, and
processing takes less than a minute. The one-day wait in the article
is just an example scenario that accounts for possible delays in the
local network, as well as the night-time usage model.
Q: Still, how does this make web access more affordable? A: The TEK system shortens the expensive connection time because
it makes browsing an offline process. A set of pages can be
downloaded from a local ISP during the cheapest and most reliable
hours; users never have to pay for online time spent reading pages or
waiting out inter-continental communication latencies. Moreover, the
client-side cache of downloaded pages and the intelligent server
processing could eliminate some searches altogether (see below).
Q: Google is fast, low-bandwidth, and even has an email interface. What's new here? A: The TEK system is not really a "search engine"; rather, it is
an end-to-end information retrieval tool with both a client and a
server. In fact, the TEK Server queries Google for its candidate
pages. The value added by the server is that it keeps track of the
pages sent to each user, and avoids sending duplicate pages in future
search results (unless, of course, a user requests an updated version
of a page.) This ensures that the client's bandwidth will be used
only to download material that is new and interesting. Note that the
server also sends the actual content of pages rather than just a list
of links; it does some basic filtering and compression of the content
to reduce the bandwidth requirements.
Q: Why do you need a program on the client side? A: The TEK Client is a very important component of the system. It
provides a web proxy that simulates an Internet connection so that
users can view downloaded pages in their favorite browser. In
addition, the proxy stores all pages in a local cache so that they can
be searched and viewed at a later time. It also provides basic user
management and query tracking so that many people can share a common
machine and email account, perhaps on a public kiosk or school
computer.
Q: Why is the client program so big as to require a CD? A: The program itself is relatively small; the JAR file is 125 KB.
When we add in third-party libraries and the installer package, the
size is up to 2 MB. Including Java in the installer bumps the size to
10 MB. We implemented the first version of the TEK Client in Java for
portability and ease of development, though we agree that a more
compact distribution is possible, and we could be interested in
exploring this in the future.
Q: Do you intend this as a permanent solution for low-connectivity areas? A: No. In the long term, there needs to be
You're right that retrieving web pages over email has already been done. A present-day service that works as you describe is www4mail, and I know people that use it regularly from low-connectivity regions.
However, the TEK system (which I'm involved in) offers several benefits over a purely email-based solution. By having a web proxy on the client side, users can use their favorite browser to view downloaded pages, complete with color and formatting, which is often absent in text-only systems. Moreover, the client keeps a local, searchable cache of all downloaded pages, and the server keeps track of which pages have been sent to avoid wasting bandwidth on duplicate content. Finally, with a web-like user interface, many users can share a single e-mail account in a public kiosk or school.
Many more details about the TEK system are available from the TEK Homepage
... sounds more like useless posturing from self righteous grant writers who are wanting to get taxpayers money by simply claiming to be doing something for the poor.
I am a graduate student working on the TEK project, and we have never received funding directed for TEK. So far, the project has been carried out using general research funds (for example, a faculty startup package) available to the PI. We have been operating with a very low budget, mainly with undergraduate students. One of the researchers, Libby Levison, worked on TEK for an entire year without receiving any pay. Most of us also work on unrelated projects that are funded separately.
As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar. For the record, we submitted a proposal to the NSF ITR program that covers TEK, but the proposal was rejected.
Hi, I'm a graduate student working on the TEK project.
We agree that you won't have too much to gain from zipping the content before sending it. The larger gains are from higher-level compression; for instance, the TEK Server keeps track of each page that it sends a given user, and it is careful not to send duplicate pages in replies to future search queries (unless the user specifically requests an updated version of a given page.) This can be especially useful in shared environments (such as a school) where there is a lot of overlap between queries.
Also, there are some marginal gains to be made by zipping more content at once. The server sends ~20 pages at a time (or all the URL's requested in a given batch), which will compress better than if they were done separately.
Your point about the bloat from the mail program is a great one, thanks. We should look into fixing this.
By the way, we see the primary benefit of TEK as being the email-based access rather than the compression. You can find many more details about the project on the TEK Homepage.
Hi, I'm a graduate student working on the TEK project.
Thank you for your post, it's an important point -- TEK is targeting users that might have no direct Internet connectivity. In some places, it can be cheaper to have an email-only account instead of full-fledged web access; for these users, TEK provides web content using only email.
In addition, there are cases where no connectivity is available, but emails can be sent in a store-and-forward fashion. For instance, we are working with First Mile Solutions, who provides store-and-forward services to rural communities using a mobile access point (such as a bus) that visits each kiosk during the day. Moreover, if the connections are unreliable by any measure, then email is a better medium than HTTP, as no end-to-end connection is needed at any time.
More information about the TEK project, including some statistics on Internet rates in the regions we are targeting, is available on the TEK Homepage
Hi, I'm a graduate student working on the TEK project.
There are several benefits of having a TEK Client program instead of just using email. But first off, the client isn't that big -- the JAR file with the TEK classes is 125 KB. When we package it up with third-party libraries and an installer, it comes to 2 MB, and with Java included, it's 10 MB. It would be interesting to try to prune down this distribution to the minimal size -- for the prototype version, we have focussed primarily on the software's functionality.
The TEK Client program is useful because it provides a seamless interface to browsing the downloaded pages. It operates as a web proxy: users adjust their browser to talk to TEK instead of the web, and then they can view pages just as if they were connected. The URL's appear as usual in the browser's "location" toolbar, and links on the page are functional. If a URL has been downloaded before, then it is loaded out of the local cache; if it has not yet been downloaded, then the user is queried to submit a request for that URL.
The TEK Client includes a local search utility for searching the cache of downloaded pages. In this way, the user can build up a local library of information that is relevant to their community; for example, in a school setting, many searches could be satisfied using only the local cache due to overlapping interests of students.
Also, the TEK Client is useful for tracking searches. In settings where connectivity is intermittent, searches can be enqueued during the day and sent at night (or when a connection is available.) The client also provides basic user management so that multiple people can share a public installation (perhaps using a single email address, which they might not own themselves) and still keep track of their own queries.
In the future, we think there are a lot of features that could be added to the client. For instance, we could seed the client with other open-source resources, such as an atlas or encyclopedia, that could be used in conjunction with web searches. There could also be an "intelligent query builder" that helps construct Internet searches (for example, by checking spelling) before going through the time and expense of connecting and sending them off.
Many more details about TEK are available from the TEK Homepage. We are currently moving our CVS source tree to SourceForge, so if you're interested in helping to improve the software, it'd be great to hear from you!
Hi, I'm a Ph.D. student working on the TEK project.
In fact, TEK does send the content of the pages it finds. In this sense, "search engine" is a bit of a misnomer -- it's more of an "information delivery tool".
Many more details about TEK are available from the TEK Homepage
Hi, I'm a Ph.D. student working on the TEK Project.
TEK does send the content of pages, not just links (although it also allows you to retrieve individual links, if desired). This allows you to get information back in a single query.
TEK stores all returned results in a local cache on the client machine, so that users can search through the pages and refer to them at a later date. The software provides a local search utility that allows you to peruse previous results with a standard web browser; you do not need to keep the emails that are returned from the TEK Server. We hope that this is useful not just for taking a snapshot of a given page, but also for averting future searches if some content has already been downloaded before.
More details are available on the TEK website:
http://tek.sourceforge.net/
Q: Do you intend this as a permanent solution for low-connectivity areas?
A: No. In the long term, there needs to be better communications infrastructure in developing regions. This system provides an interim solution for delivering much-needed information. It also serves as a stepping-stone to full connectivity, as it simulates a web connection from the client machine. Once the infrastructure is available, many rural users will have developed familiarity with browsers, web pages, and search engines.
Q: How is the project funded?
A: We have never received funding directed for TEK. The project has been carried out on a low budget, using mostly undergraduates and funded by general research funds (e.g., a faculty startup package). One of the researchers, Libby Levison, worked on the project for an entire year without receiving any pay. As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar.
Q: Do you need any help with the project?
A: Yes! We are currently moving our CVS source tree to SourceForge, where we welcome open-source developers. We can also use help deploying the software in low-connectivity communities -- please find our contact information on our website. Thank you.
In terms of the current version of the software, you're right -- it's mostly a lot of engineering. To get things working, we put together known techniques and tools into a single package.
However, there are two big pieces that are missing in your description. One is the client, which is a web proxy that simulates a web connection and interacts with the user. It's about 10,000 lines of Java code, plus 5,000 lines of HTML. The other aspect is that the server actually keeps track of all the pages that have been sent to a given client, to avoid sending duplicate content in the future. For this, we use a database, which complicates parts of the server code; in all, the server comes out at 16,000 lines of Java right now.
So, I'd be impressed if you could write this in two days. However, if you're interested, we could use your help improving the software! We're currently moving our CVS source tree to SourceForge (under project "TEK").
In reality, the hardest problems have been dealing with OS-level issues. How do you send and receive mail in a general way? We're currently using SMTP to send, and attachments to receive. But this has some limitations... we experimented at length with the MAPI interface and other approaches, but always had some roadblocks.
Also, there are a number of "hard" research questions that we think you can consider once the basic infrastructure is in place. For example, what is the best procedure to select pages with a high "information density"? How does the selection procedure interact with the geographical and demographic characteristics of a given client? How can you gather as much information as possible at the client side to make the search successful? These are the kinds of things we'd like to consider eventually.
Many more details about the project are available on the TEK Homepage.
Q: Who are the intended users, and how slow is their connection?
A: The primary targets are communities where Internet access is expensive, unreliable, or completely unavailable. In developing nations, an email account is often significantly cheaper than full-fledged web access; for a few examples, see our last paper. Moreover, there are many cases where connectivity is intermittent, and it is cheaper and more reliable to send files in a batch mode during off-peak hours. Regardless of the modem speed, users in developing regions are often plagued by long latencies and low bandwidths due to congested infrastructure and inter-continental links. Many such users have expressed a lot of excitement about the TEK system.
Q: But your server takes 24 hours to reply. How will that speed things up?
A: Actually, our server replies immediately to each query, and processing takes less than a minute. The one-day wait in the article is just an example scenario that accounts for possible delays in the local network, as well as the night-time usage model.
Q: Still, how does this make web access more affordable?
A: The TEK system shortens the expensive connection time because it makes browsing an offline process. A set of pages can be downloaded from a local ISP during the cheapest and most reliable hours; users never have to pay for online time spent reading pages or waiting out inter-continental communication latencies. Moreover, the client-side cache of downloaded pages and the intelligent server processing could eliminate some searches altogether (see below).
Q: Google is fast, low-bandwidth, and even has an email interface. What's new here?
A: The TEK system is not really a "search engine"; rather, it is an end-to-end information retrieval tool with both a client and a server. In fact, the TEK Server queries Google for its candidate pages. The value added by the server is that it keeps track of the pages sent to each user, and avoids sending duplicate pages in future search results (unless, of course, a user requests an updated version of a page.) This ensures that the client's bandwidth will be used only to download material that is new and interesting. Note that the server also sends the actual content of pages rather than just a list of links; it does some basic filtering and compression of the content to reduce the bandwidth requirements.
Q: Why do you need a program on the client side?
A: The TEK Client is a very important component of the system. It provides a web proxy that simulates an Internet connection so that users can view downloaded pages in their favorite browser. In addition, the proxy stores all pages in a local cache so that they can be searched and viewed at a later time. It also provides basic user management and query tracking so that many people can share a common machine and email account, perhaps on a public kiosk or school computer.
Q: Why is the client program so big as to require a CD?
A: The program itself is relatively small; the JAR file is 125 KB. When we add in third-party libraries and the installer package, the size is up to 2 MB. Including Java in the installer bumps the size to 10 MB. We implemented the first version of the TEK Client in Java for portability and ease of development, though we agree that a more compact distribution is possible, and we could be interested in exploring this in the future.
Q: Do you intend this as a permanent solution for low-connectivity areas?
A: No. In the long term, there needs to be
You're right that retrieving web pages over email has already been done. A present-day service that works as you describe is www4mail, and I know people that use it regularly from low-connectivity regions.
However, the TEK system (which I'm involved in) offers several benefits over a purely email-based solution. By having a web proxy on the client side, users can use their favorite browser to view downloaded pages, complete with color and formatting, which is often absent in text-only systems. Moreover, the client keeps a local, searchable cache of all downloaded pages, and the server keeps track of which pages have been sent to avoid wasting bandwidth on duplicate content. Finally, with a web-like user interface, many users can share a single e-mail account in a public kiosk or school.
Many more details about the TEK system are available from the TEK Homepage
I am a graduate student working on the TEK project, and we have never received funding directed for TEK. So far, the project has been carried out using general research funds (for example, a faculty startup package) available to the PI. We have been operating with a very low budget, mainly with undergraduate students. One of the researchers, Libby Levison, worked on TEK for an entire year without receiving any pay. Most of us also work on unrelated projects that are funded separately.
As a policy, we have never applied for funding from any organization where we will be in competition with developing nations for the same dollar. For the record, we submitted a proposal to the NSF ITR program that covers TEK, but the proposal was rejected.
We agree that you won't have too much to gain from zipping the content before sending it. The larger gains are from higher-level compression; for instance, the TEK Server keeps track of each page that it sends a given user, and it is careful not to send duplicate pages in replies to future search queries (unless the user specifically requests an updated version of a given page.) This can be especially useful in shared environments (such as a school) where there is a lot of overlap between queries.
Also, there are some marginal gains to be made by zipping more content at once. The server sends ~20 pages at a time (or all the URL's requested in a given batch), which will compress better than if they were done separately.
Your point about the bloat from the mail program is a great one, thanks. We should look into fixing this.
By the way, we see the primary benefit of TEK as being the email-based access rather than the compression. You can find many more details about the project on the TEK Homepage.
Thank you for your post, it's an important point -- TEK is targeting users that might have no direct Internet connectivity. In some places, it can be cheaper to have an email-only account instead of full-fledged web access; for these users, TEK provides web content using only email.
In addition, there are cases where no connectivity is available, but emails can be sent in a store-and-forward fashion. For instance, we are working with First Mile Solutions, who provides store-and-forward services to rural communities using a mobile access point (such as a bus) that visits each kiosk during the day. Moreover, if the connections are unreliable by any measure, then email is a better medium than HTTP, as no end-to-end connection is needed at any time.
More information about the TEK project, including some statistics on Internet rates in the regions we are targeting, is available on the TEK Homepage
There are several benefits of having a TEK Client program instead of just using email. But first off, the client isn't that big -- the JAR file with the TEK classes is 125 KB. When we package it up with third-party libraries and an installer, it comes to 2 MB, and with Java included, it's 10 MB. It would be interesting to try to prune down this distribution to the minimal size -- for the prototype version, we have focussed primarily on the software's functionality.
The TEK Client program is useful because it provides a seamless interface to browsing the downloaded pages. It operates as a web proxy: users adjust their browser to talk to TEK instead of the web, and then they can view pages just as if they were connected. The URL's appear as usual in the browser's "location" toolbar, and links on the page are functional. If a URL has been downloaded before, then it is loaded out of the local cache; if it has not yet been downloaded, then the user is queried to submit a request for that URL.
The TEK Client includes a local search utility for searching the cache of downloaded pages. In this way, the user can build up a local library of information that is relevant to their community; for example, in a school setting, many searches could be satisfied using only the local cache due to overlapping interests of students.
Also, the TEK Client is useful for tracking searches. In settings where connectivity is intermittent, searches can be enqueued during the day and sent at night (or when a connection is available.) The client also provides basic user management so that multiple people can share a public installation (perhaps using a single email address, which they might not own themselves) and still keep track of their own queries.
In the future, we think there are a lot of features that could be added to the client. For instance, we could seed the client with other open-source resources, such as an atlas or encyclopedia, that could be used in conjunction with web searches. There could also be an "intelligent query builder" that helps construct Internet searches (for example, by checking spelling) before going through the time and expense of connecting and sending them off.
Many more details about TEK are available from the TEK Homepage. We are currently moving our CVS source tree to SourceForge, so if you're interested in helping to improve the software, it'd be great to hear from you!
In fact, TEK does send the content of the pages it finds. In this sense, "search engine" is a bit of a misnomer -- it's more of an "information delivery tool".
Many more details about TEK are available from the TEK Homepage
http://tek.sourceforge.net
We are also in the process of migrating our CVS source tree to SourceForge.
Hi, I'm a Ph.D. student working on the TEK Project. TEK does send the content of pages, not just links (although it also allows you to retrieve individual links, if desired). This allows you to get information back in a single query. TEK stores all returned results in a local cache on the client machine, so that users can search through the pages and refer to them at a later date. The software provides a local search utility that allows you to peruse previous results with a standard web browser; you do not need to keep the emails that are returned from the TEK Server. We hope that this is useful not just for taking a snapshot of a given page, but also for averting future searches if some content has already been downloaded before. More details are available on the TEK website: http://tek.sourceforge.net/