Open Content Network (P2P meets Open Source)
Orasis writes "The creators of Swarmcast have announced a new peer-to-peer content delivery network called the Open Content Network. The OCN will allow users to download open source and public domain content from multiple peers and mirrors in parallel. The system is designed to augment the existing mirrors with bandwidth from the p2p network and should eliminate the "Slashdot Effect" for popular open source content."
The Open Content site just announces a list of intentions. Anyone can put this kind of info up. It looks to me like nothing has been achieved yet, making this not really news.
---- scrm
What about the openft protocol, they've been working on that for a while gift.sourceforge.net. They originally used the fasttrack protocol (KaZaa), but after kaZaa changed there specs, they decided to create their own protocol.
Maybe you didn't notice that these guys are the makers of Swarmcast. Or maybe you posted before figuring out what that meant.
Swarmcast is a (working!) program for parallel p2p file downloading. In other words, the technology IS implimented. They basically are just making a modified program to work with a somewhat different set of files. No biggie.
"Never, never suspect the dreams within the dreams of dreaming children." ~The Amazon Quartet
Content-Addressable Web
/pub/Redhat-7.1-i386-disc1.iso HTTP/1.1
R ES: http://www.linuxmirrors.com/pub/Redhat-7.1i386-dis c1.iso ; N2R
Content Distribution Networks (CDNs), such as Akamai, have shown that significant improvements can be made in throughput, latency, and scalability when content is distributed throughout the network and delivered from the edge. Likewise, peer-to-peer systems such as Napster and Gnutella have shown that normal desktop PCs can serve up enormous amounts of content with zero administration. And more recently, systems like Swarmcast have been introduced that combine the CDN and peer-to-peer concepts to gain the benefits of both. The goal of the Content-Addressable Web is to enable these advanced content location and distribution services with standard web servers, caches, and browsers.
The main benefits of the Content-Addressable Web are:
Throughput - Browsers will be able to download content from multiple sources in parallel
Bandwidth Savings - Browsers will automatically discover and select the closest mirror for a piece of content.
Fault Tolerance - Even if a site goes down in the middle of a download, browsers will automatically locate another mirror and continue downloading.
Scalability - Any number of machines may be added to the network, creating a CDN ad hoc, with very little administration.
Security - Browsers will be able to safely download content from untrusted mirrors without risk of corruption or viruses.
The full paper describing the "HTTP Extensions for a Content-Addressable Web" is available here.
The goal of the Content-Addressable Web (CAW) is to enable the creation of advanced content location and distribution services over HTTP. The use of content addressing allows advanced caching techniques to be employed, and sets the foundation for creating ad hoc Content Distribution Networks (CDNs). This document specifies HTTP extensions that bridge the current location-based Web with the Content-Addressable Web.
1. Introduction
Content Distribution Networks (CDNs), such as Akamai, have shown that significant improvements can be made in throughput, latency, and scalability when content is distributed throughout the network and delivered from the edge. Likewise, peer-to-peer systems such as Napster and Gnutella have shown that normal desktop PCs can serve up enormous amounts of content with zero administration. And more recently, systems like Swarmcast have been introduced that combine the CDN and peer-to-peer concepts to gain the benefits of both. The goal of the Content-Addressable Web is to enable these advanced content location and distribution services with standard web servers, caches, and browsers.
There are a number of short-comings of current web architecture that the Content-Addressable Web aims to overcome. These include discovering optimal replicas, downloading from untrusted caches, and distributing content across the Transient Web.
1.1 Optimal Replicas
There are currently no mechanisms within HTTP that allows a user-agent to discover an optimal replica for a piece of content. This problem is due to the fact that HTTP caching practice assumes a hierarchical caching structure where each user has a single parent cache. Thus while one can discover an object's source URI from a cached copy, there is no mechanism to discover a list of replica locations from the source. This problem is evidenced by the fact that users must manually select the closest mirrors when downloading from Tucows, FilePlanet, or the various Linux distributions. The CAW solves this problem by providing distributed URI resolvers that user-agents can query to find an optimal replica.
1.2 Untrusted Caches
It is currently unsafe to download web objects from an untrusted cache or mirror because they can modify/corrupt the content at will. This becomes particularly problematic when trying to create public cooperative caching systems. This isn't a problem for private CDNs, like Akamai, where all of their servers are under Akamai's control and are assumed to be secure. But for a public CDN, the goal is to allow user-agents to retrieve content from completely untrusted hosts but be assured that they are receiving the content intact. The CAW solves this problem by using content addressing that includes integrity checking information.
1.3 Transient Web
The Transient Web is a relatively new phenomenon that is growing in size and importance. It is embodied by peer-to-peer systems such as Gnutella, and is characterized by unreliable hosts with rapidly changing locations and content. These characteristics make location-based addresses within the Transient Web quite brittle. Even if traditional HTTP caching was widely leveraged within the Transient Web, the situation wouldn't be helped much. This is because a single piece of content will often be available under many different URIs, which creates disjoint and inefficient caching hierarchies.
This multiplicity of URIs occurs for a couple of reasons:
The original source for a piece of content will often cease to exist or the source's URI will change.
Multiple independent sources often introduce the same content into the network.
Most applications and file manipulation tools will tend to "forget" the source URI of a piece of content.
This URI multiplicity can also occur in the normal web, although it is RECOMMENDED that caching semantics be used when an authoritative source is known. The CAW solves the above problems by providing content-specific URIs that are location-independent and can be independently generated by any host. Additionally, various URI resolution services work in coordination to resolve issues associated with having multiple URIs for a web object.
2. Scope
The HTTP extensions for CAW are intended to be used for in the above scenarios where HTTP is currently lacking. This technology is focused on mostly static content that can benefit from advanced content distribution services. The extensions are intended to be hidden under the hood of web servers, caches, and browsers and should change nothing as far as end users are concerned. So even though a new URN scheme is introduced, there are very few situations where a human will ever interact with those URNs.
One of the more interesting applications of the Content-Addressable Web is the creation of ad hoc Content Distribution Networks. In such networks, receivers can crawl across the network, searching for optimal replicas, and then downloading content from multiple replicas in parallel. After a host has downloaded the content, it then advertises itself as a replica, automatically becoming a part of the CDN.
3. Content Addressing
This specification introduces a URI scheme with many interesting capabilities for solving the problems discussed earlier. A particularly useful class of URI schemes are "Self-Verifiable URIs". These are URIs with which the URI itself can be used to verify that the content has been received intact. We also want URIs that are content-specific and can be independently generated by any host with the content. Finally, to show the intent that these addresses are location-independent, a URN scheme will be used.
Cryptographic hashes of the content provide the capabilities that we are looking for. For example we can take the SHA-1 hash of a piece of content and then encode it using Base32 to provide the following URN.
urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB
Implementations MUST support SHA-1 URNs at minimum.([footnote] A future version of this document will also specify a URN format for performing streaming and random-access verification using Merkle Hash Trees.)
Receivers MUST verify self-verifiable URIs if any part of the content is retrieved from a potentially untrusted source.
4. HTTP Extensions
In order to provide a bridge between the location-based Web and the Content-Addressable Web, a few HTTP extensions must be introduced. The nature of these extensions is that they need not be widely deployed in order to be useful. They are specifically designed to allow for proxying for hosts that are not CAW-aware.
The following HTTP extensions are based off of the conventions defined in RFC 2169. It is RECOMMENDED that implementers of this specification also implement RFC 2169.
The HTTP headers defined in this specification are all response headers. No additional request headers are specified by this document.
It is RECOMMENDED that implementers of this specification use an HTTP/1.1 implementation compliant with RFC 2616.
4.1 X-Content-URN
The X-Content-URN entity-header field provides one or more URNs that uniquely identify the entity-body. The URN is based on the content of the entity-body and any content-coding that has been applied, but not including any transfer-encoding applied to the message-body. For example:
X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB
4.2 X-URI-RES
The X-URI-RES header is based off of conventions defined in RFC 2169 and provides a number of flexible URI resolution services. These headers provide various ways of locating other content replicas, including additional sources for a multiple-source download. One can also build an application that crawls across the resolution services searching for an optimal replica. Many other uses can be imagined beyond those given in this specification. The general form of the header is as follows:
X-URI-RES: ; [; target uri]
The service URI specifies the URI of the resolution service. It is not necessary for the service URI to conform to "/uri-res/ ?" convention specified in RFC 2169.
The service type identifies what type of resolution is being performed and how to interpret the results from the service URI. The types are those defined in RFC 2169 and include "N2L", "N2Ls", "N2R", "N2Rs", "N2C", "N2Cs", "N2Ns", "L2Ns", "L2Ls", and "L2C".
The target URI is the URI upon which the resolution service will be performed. The target URI can be any URI and is specifically not limited to the URI specified by the X-Content-URN header. If there is only a single X-Content-URN value, the target URI can be left off to imply that the X-Content-URN value is to be resolved.
It is RECOMMENDED that receivers assume that the URI resolver services are potentially untrusted and should verify all content retrieved using a resolver's services.
It is believed that N2R, N2L, and N2Ls will be the most useful services for the Content-Addressable Web, so we will cover examples of those explicitly.
4.3 N2R
The N2R URIs directly specify mirrors for the content addressed by the URN and can be useful for multi-source downloads. For example:
X-URI-RES: http://urnresolver.com/uri-res/N2R?urn:sha1:; N2R
or
X-URI-RES: http://untrustedmirror.com/pub/file.zip; N2R
The key difference between these headers and something like the Location header is that the URIs specified by this header should be assumed to be untrusted.
4.4 N2L and N2Ls
These headers are used when other hosts provide URLs where the content is mirrored. This is most useful in ad hoc CDNs where mirrors may maintain lists of other mirrors. Browsers can simply crawl across the networks, recursively dereferencing N2L(s). For example:
X-URI-RES: http://urnresolver.com/uri-res/N2L?urn:sha1:; N2L
and
X-URI-RES: http://untrustedmirror.com/pub/file-mirrors.list; N2Ls; urn:sha1
For the N2Ls service, it is RECOMMENDED that the result conform to the text/uri-list media type specified in RFC 2169.
4.5 Proxies and Redirectors
It is useful to allow CAW-aware proxies that provide content-addressing information without modifying the original web server. This allows CAW-aware user-agents to take advantage of the headers, while simply redirecting user-agents that don't understand the Content-Addressable Web. It would be inappropriate to return an X-Content-URN header during a redirect, because HTTP 3xx responses often still include a message-body that explains that a redirect is taking place. Instead it is preferred to return a result of the text/uri-list media type that includes one or more URNs that would normally reside in the X-Content-URN header.
4.6 Example Application
The above HTTP extensions are deceptively simple and it may not be readily apparent how powerful they are. We will discuss an example application that will take advantage of a few of the features provided by the extensions.
In this example we will will look at how the CAW could help at linuxiso.org where ISO CD-ROM images of the various linux distributions are kept. The first step will be to issue a GET request for the content:
GET
Host: www.linuxiso.org
The abbreviated response:
HTTP/1.1 200 OK
Content-Type: Application/octet-stream
Content-Length: 662072345
X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB
X-URI-
X-URI-RES: http://123.24.24.21:8080/uri-res/N2R?urn:sha1:; N2R
X-URI-RES: http://123.24.24.21:8080/uri-res/N2Ls?urn:sha1:; N2Ls
With this response, a CAW aware browser can immediately begin downloading the content from www.linuxiso.org, linuxmirrors.com, and 123.24.24.21 all in parallel. At the same time the browser can be dereferencing the N2Ls service at 123.24.24.21 to discover more mirrors for the content.
The existence of the 123...21 host is meant to represent a member of an ad hoc CDN, perhaps the personal computer of a linux advocate that just downloaded the ISO and wants to share their bandwidth with others. By dereferencing the N2Ls, even more ad hoc nodes could be discovered.
4.7 Replica Advertisement
The URI-RES framework provides a significant amount of flexibility in how replica advertisement and discovery can be implemented. One example implementation will be provided in a future specification.
4.8 Acknowldgements
Gordon Mohr (gojomo@bitzi.com), Tony Kimball (alk@pobox.com), Mark Baker (distobj@acm.org)
NAT alone is not an effective method of preventing people from using p2p programs. All it does is prevent incoming TCP connections, so as long as someone in the network (well, some reasonable minority of peers) can get incoming connections to bootstrap people into the network, everyone can still comminicate despite the inability to get new incoming connections.
Good NAT bypassing is annoying to program (in the extreme case, it requires implementing something like TCP over UDP) but it's not a huge techncal hurdle, the main reason it's not commonly done is because too few people have hostile NATs for it to be worth the effort.
--
Benjamin Coates
Right on target. Freenet accomplishes these goals, and actually works right now. Freenet is essentially an anonymous, distributed caching system into which anyone can insert data and retrieve it later. It supports both locating information by content hashes or by a human-readable redirect, as well as lots of really cool features like anonymous websites ("freesites"). So... what are you waiting for? Install Freenet today!
</plug>
Who pays for all that equipment and bandwidth? The idea here is not to solve problems by throwing resources at a problem, but rather to solve them by using existing resources as effectively as possible. The technology involved can be applied to any resource base. The technology-intensive approach using almost-zero-cost resources might well make significant headway against the Slashdot Effect, even if you still think your capital-intensive approach based on older technology is even better.
Another factor you seem to've overlooked is that software like CAW or BitTorrent are distributed for reasons beyond scalability. For example, consider the inherent attack-resistance characteristics of a highly distributed P2P network, vs. your centrally-administered servers. There are other goals as well, such as avoiding legal culpability or financial dependence on corporate benefactors to provide the systems and bandwidth. Whether you agree or disagree with those goals, the fact remains that many people believe in them. Networks like you describe are old hat, dozens have been deployed already, and yet a lot of people still want something different. You've proposed a solution to a different problem than the one Onion Networks et al seek to solve. There's a term for that; we call it missing the point.
Slashdot - News for Herds. Stuff that Splatters.
We've had several large deployments of files which are a couple hundred megabytes and up, getting sustained downloads of a couple hundred downloaders at once, serving off a dsl line, and it's worked well.
By the way, BitTorrent, Swarmcast, and OCN all check secure hashes under the hood, so data integrity isn't an issue.