Gnutella at One Year
transient writes: "Gnutella's first birthday passed quietly about 10 days ago. An OpenP2P article reflects on the Gnutella network as a transient extension of the Web, since Gnutella peers use HTTP for file transfer and are essentially Web servers. Seems the network keeps evolving; there's some discussion of the new BearShare Defender and more info on the recent Gnutella virus. If Gnutella peers are Web servers, wouldn't that make Gnutella users who share files equivalent to Web site publishers, with the same responsibilities?"
Multicast is pretty easy to program, not much harder then UDP. Or at least the system interface is almost exactly the same (you have to manually set the TTL, that's about the only difference I remember).
Getting a multicast feed is harder, but not really harder then NNTP, you find someone who has one and request a tunnel (unless your ISP magically gives you multicast, which is quite rare).
Mind you this was the state of affairs about 8 years ago, when I did the multicast news software in 1993~1994. Well, you also frequently needed kernel patches then too, but I don't think that is needed in modern unix-like systems.
It is quite hard doing something with multicast that doesn't suffer congestion problems, it is like doing normal UDP work where the protocall doens't help you with packet loss or congestion, except it is far harder to get replies from all receivers (in fact if you want to scale forever you can't ever accept any replies from anyone). It's a big old pain, but people do UDP based systems, and they could do multicast ones as well with more work.
Well you can get a tunnel from anyone, not just your ISP. It is in the ISP's best intrest to be the one to provide it, unless they charge you per bandwidth used.
Last I checked UUNET gave free multicast to any leased line cust's. That was quite a while ago though, it may have changed. I also know they can do multicast to dial-ups (anyone using Ascend boxes there should be able to do it), but I don't know what the deal on that is.
I am working on a project that may satisfy a number of the intended features you mention.
For exmaple...
As a straight-peer-to-peer network grows, it becomes saturated with traffic. Requests are sent, propagated, and choke the entire network of peer-to-peer clients, usually at the lowest bandwith level.
I saw this first hand when using a modified Gnutella client to monitor the types and number of queries occuring on the network. The vast majority was crap or outright maliscious, and it brought my 1.5Mbps downstream DSL line to a crawl.
But it is possible to have a fully decentralized network that is bandwidth friendly. I am working on it now.
If you try to run this through an established client server system, lawyers decend like flocks of carrion birds.
Another important asecpt of this network is that searching and actual transfer are decoupled. When you find some hits to your query, you are returned a list of locators for that resource. These may be simple HTTP style, or they may be Freenet SHA-1 hash keys. Which means that you can find the content you seek in an open, decentralized network. And the obtain it (if it is senstive data in your country, etc) via a secure, anonymous manner like Freenet.
And finally, the most important aspect of this network is that it is adaptive to your preferences. A very large problem with Gnutella and other peer based networks is spam and irrelevant results. With this network you continually add peers who respond with relevant, quality information, and drop other peers who provide no value.
At any rate, if you are interested, you can read more about this project. It is called the ALPINE Network and the main page is at http://cubicmetercrystal.com/alpine/
The biggest problem with multicast over the internet is that it is not supported.
t -bb-lcc-00.txt
If it was supported, then the biggest problem would be congestion avoidance.
The congestion avoidance algorithms built into TCP are the only saving grace for the internet backbone as it exists today. With any kind of widely deployed multicast, this becomes very critical to implement and work efficiently.
There has been some progress in this area, but it is a very difficult problem. The IETF has a working group on multicast congestion control. Its work is available here:
http://www.ietf.org/internet-drafts/draft-ietf-rm
Another great site is Peertal at http://www.peertal.com/ for all sorts of news about peer to peer projects and news.
Ben Housten has a good page with ideas and links at http://www.exocortex.org/p2p/index.html
The Peer to peer working group has their site at http://www.peer-to-peerwg.org/
You may also want to check out the Orielly OpenP2P page at http://www.oreillynet.com/p2p/
And of course, I need to shamelessly plug my open source decentralized searching network, the ALPINE Network
Because dropping packets does not ease congestion. If you are sending a flood of packets that is continually overloading a given router, the TCP connections will starve, and the majority of the UDP multicast packets will be dropped.
This would be a horrible scenario!
There is some good information on TCP friendliness and congestion avoidance algorithms here: http://www.psc.edu/networking/tcp_friendly.html
This really is incredibly important. Anything that starves TCP and introduces congestion at a wide level in the internet is going to wreak havoc.
OTOH, the bandwidth usage of gnutella searches vs. the total bandwidth available is a very small ratio.
This is probably true. Most gnutella clients would be on smaller DSL or modem links. These would have a hard time overwhelming bandwidth.
In most cases the problem occurs between different ISP level routers or at the client's link itself.
If the traffic on the multicast channel was greater than 56k consistently, the modem clients TCP connections would starve, which would not be good from an end user perspective.
If the traffic was such that a small ISP was using most of their bandwidth (probably outgoing for the multicast to destinations) then all clients of that ISP would be having problems as well.
It really does get pretty tricky quickly. however, there is good progress being made in this area, and perhaps with IPv6 we will begin to see multicast working on a larger scale (I hope so!)
The project I am working uses unicast to multiple destinations that acts very similiar to a multicast. However, I had to build some very elaborate mechanisms into the protocols to keep congestion and TCP starvation from occuring as well as allowing varied bandwidth links the ability to communicate without the fast ones overwhelming the slower ones.
This is the ALPINE Network and the more extensive information about congestion avoidance is here
If it looks like a duck, and quacks like a duck, it's a duck. Gnutella's goal is to get a search packet to be seen by every node connected to the network. That sounds a lot like multicast to me.
At layer 2 and 3 multicast could be implemented via flooding every node on the network with your multicast packet, just like Gnutella does. So, the flood goes over a bunch of TCP links instead of a bunch of point to point WAN links and broadcast Ethernet links, what's the essential difference here?
Her idea is a sort of automatic tunneling system that leverages IP routing to build the multicast tree out of multicast aware routers. There don't actually have to be any multicast aware routers for it to work. They just make the tree more efficient.
I thought of an idea for fixing gnutella awhile ago, which largely involved gnutella nodes forming up into their own multicast trees where the multicast packets traveled over TCP links instead of point to point WAN links. When I read that part of Internetworking, I was so struck by the similarity of our ideas that I made a point to talk to her during IETF 50. Her's is a lot better than mine because implementing it at the IP layer leverages existing IP routing to avoid duplication of packets on any given link.
Need a Python, C++, Unix, Linux develop
Why not just have routers drop packets like they do right now for TCP? Nobody ever claimed that multicast had to be reliable.
Need a Python, C++, Unix, Linux develop
You haven't thought through the problem very well. Right now, links involved in a gnutella network often see every single search packet many times, along with all the associated TCP ack packets. How is this reducing the burden on routers?
Gnutella wants every single node that's connected to see every search request. By any definition I can think of, that's anysource multicast. I don't care what you think of the efficiency of multicast, any layer 3 multicast scheme is going to be more efficient than gnutella currently is by virtue of the fact that physical network topology can be taken into account at layer 3.
Why don't you go read the chapter I was referring to before posting again? Better yet, please explain to me how what gnutella does isn't multicast, and how what gnutella does is better for any segment of the network than a good multicast implementation would be?
Need a Python, C++, Unix, Linux develop
Hmmm... Yes, you're correct. With single source multicast, this has a possibility of any easy solution. With any source multicast it's a lot harder.
OTOH, the bandwidth usage of gnutella searches vs. the total bandwidth available is a very small ratio. For this particular application, I don't think it'll be terribly important, but it's a good thing to think about.
Need a Python, C++, Unix, Linux develop
Strangely enough, it was UUNET that I asked and I was quoted a pretty hefty price. I asked my ISP (visi.com) first, and they said they had dropped it due to lack of interest and wouldn't pick it back up again just for me. :-) So, I was kind of stuck.
I also think this is too much to go through for multicast to work. It should just work without having to call someone to get a tunnel, and without having to look up a tunnel on some website.
Need a Python, C++, Unix, Linux develop
If easy to program, easy to implement multicast were available, gnutella would've used it and not been nearly as poor in the scalability department. Gnutella is basically a layer 5 implementation of anysource multicast that uses flooding to get its job done.
If anybody is interested, I talk to Radia Perlman at IETF 50 last week, and we would like to try to form a working group around making an RFC out of the simple multicasting protocol she describes in the last chapter of her book 'Internetworking'.
Need a Python, C++, Unix, Linux develop
Yeah, but only MusicCity doesn't suck. At this moment, the biggest server after MusicCity listed on napigator.com has 240607 files, compared to each MusicCity's 3,960,000+ files, plust hte fact that MusicCity servers are linked. So all they have to do is take down MC and opennap dies.
Switch the . and the @ to email me.
As a straight-peer-to-peer network grows, it becomes saturated with traffic. Requests are sent, propagated, and choke the entire network of peer-to-peer clients, usually at the lowest bandwith level. Since there is no central coordinating system to handle the search requests, you eventually get a network that is ass slow and unable to perform to expected levels. If you try to run this through an established client server system, lawyers decend like flocks of carrion birds. So it seems to me the fix is a hybrid network of servers that are promoted up from a pool of high bandwith connections, organized like resistance cells. These client machines would only talk to an upper level system, transferring a list of songs on the system to its cell leader. This cell leader would be part of a higher-level cell, and would send data about what was in its cell to a higher level server. Eventually, you hit the top level where you would have a ring of systems on very high bandwith connections.
Search requests would hop to the top level servers, who would talk to each other and fire back the answer. Then the two (client) machines would start swapping data. These top level machines would be updated from below with fresh data, updating their search pool dynamically.
As clients come online, they would find a server, report what they have in their swap folder, and start sharing data. requests for searches would only go to the highest bandwith systems, and then only those that are willing to serve in this capacity. If you come online with a nice fast machine, with a fat network pipe, you can become part of the search network.
Obviously, there would need to be some method of pointing clients to servers, especially if the servers were to dynamically drop on and off the network. I envision that once the sofware determines that you qualify to be a server, and you check that you do want participate, it would set you up as a backup server for a functioning system. when that system drops from the network, your machine would find another comperable system and set it up as a backup.
Any thoughts on this? Is it already being done? Should I stop smoking the crack? I know that this would be a nontrivial problem to set up, but it seems that it would remain rather uncentralized and chaotic, but not be as prone to choking as gnutella is.
I provide other people's content, but most of it is not illegaly provided. It's free software, shareware, Project Gutenberg files (my contribution was to rename them from their cryptic 8-letters-and-digits name to something one can find like Dafoe - Robinson Crueso.txt) and tons of scientifc papers (again, renaming them is crucial). Best thing is, I've seen 'my' files being shared by the folks who downloaded them. So, at least there, the system works.
OTOH, sharing large files currently is not a good idea. Modem users will try to leech them and I'm not up that long. That might be solved by clients that require a minimum bandwidth for certain files. Also, I want to have "upload slots" available for the small files I share. Typically, all of the upload slots I provide are filled with uploads of larger files so that nobody will be able to get through to the 10 KB files.
There should also be an automatic ban of people who hammer me with requests.
The problem I see is not whether the protocol itself can scale, we are seeing numerous "tweaks" that will allow this ( Clip2's Reflector and Bearshare.net's forthcoming 3.0.0 "Defender" release) What I see as the problem is the splintering and added features being incorporated by the different Gnutella Clients: Gnotella has added "Improved bitrate scanning", BearShare and Limewire's Firewall Detection, as well as other "extraneous" features, that add information to the gnutella packets. How long will it be before these clients cause sufficient incompatibility that seperate, client specific networks arise? What we really need is an agreement between the different developers to pass on these extra packets, or agree on a central "feature set". I am not advocating that we do away with the myriad gnutella clinets, I think there variety and different personalities are great. I just don't want to see the community splinter through incompatibility issues.
-OctaneZ
(What I would really like to see is a native applications similar to Clip2's reflector for both WIN32 and Linux that serves as a "network server" only, that uses low CPU and large numbers of connections for people who believe in the Gnutella idea and are graced with highspeed connections.)
I was initially really intrigued by the start of the article, which points out that the web in its infancy was essentially a p2p system, even if the http protocol wasn't meant for it... almost everyone ran both servers and clients and shared content. But then I thought... the real reason why that kind of environment didn't continue isn't so much that the masses started connecting via transient means. It's that not that many people really have compelling content of their own to share. Just look at what most people use current p2p apps for: to redistribute other people's content. With only a few real content providers, there's no inherent reason why one-to-many is worse than many-to-many; in fact, there are many reasons why one-to-many is better (assured quality of content, for one... anyone else tired of downloading songs on napster or gnutella, only to find out later that they're incomplete?) The only reason why everyone is turning to p2p is because it's currently the easiest & best way to steal apps/music/miscellaneous content produced by others. If the music companies had any clue, they'd run their own servers serving digital copies of every song ever produced for a reasonable fee, and then we'd see the days of many-to-many return to the grave.
The programmers of the system.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
"If Gnutella peers are Web servers, wouldn't that make Gnutella users who share files equivalent to Web site publishers, with the same responsibilities?"
To what responsibilities are you referring? I don't see how the protocol in use would change anything.
"[...] they're next [...]"
Who is the "they"? Gnutella (-compatible-software) users? Judging their actions, the RIAA and MPAA don't seem willing to go after end users.
"dies" - how do you mean? What software were you using? Did you try more than one reflector? (Did you even try a reflector?)
Everything is now illegal.
The current Slashdot moderation system is made by gay communists!
Yeah, that seems about how long I've been waiting for this file to download.
:)
Stupid Cheap Guitars
If there's anyone the RIAA would go after, it's the people who make the clients. They (RIAA) could claim that the individual programmers involved in the making of Gnutella clients are acting as a vehical for piracy. The MPAA can jump on board too, because it also allows you to trade movies.
Individual users may be protected with the webserver loophole, but as Gnutella gains popularity along with ease of use (anyone ever use LimeWire?), the lawsuits will pop up.
--
--
#nohup cat
>Many ISPs write into their TOS that you aren't allowed to
>run servers because they are afraid of the content providers,
>don't want to provide the bandwidth anyway, and want to charge
>much higher fees to supposedly commercial servers.
And, I dare say, many ISPs don't give a flying fuck about this particular TOS entry unless you're running a server that's a) taking up inordinate amounts of bandwidth or b) serving illegal material. Even at that, they still won't care about b) unless someone reports it.
A cursory glance at my BearShare hosts at any given moment shows mostly cable/DSL users. Most of those providers forbid running servers, but most of them have no real way to tell, unless you're congesting the network. I was a bit surprised that BearShare's latest version sets the default max number of simultaneous uploads to 10 (I keep mine set at 2) but for the most part, unless you're a total dumbass, running Gnutella isn't going to pop up any bandwidth-sucking red flags at your provider's NOC.
One of Gnutella's strong points - unlike a lot of standard protocols - is that you can dynamically change your listen port to whatever you want, and the changes are effective immediately to the rest of the network. If your ISP blocks/monitors 6346, you can change it to something else. If your ISP blocks that, you can change it again; and for the really paranoid, you could write a dirty VB bot or something to change your listen port every hour. Of course, you *could* do the same for FTP/HTTP/etc servers but it would make it more difficult for your visitors to find you.
Server-ban or no, most if not all ISPs have no reliable way to detect or block Gnutella traffic. I think that's quite an advantage.
As for bandwidth being on the rise, you have to consider that file sizes are increasing as well. 5 years ago, the end user surely couldn't download at 200+K/sec, but 5 years ago, the end user wasn't sharing 250MB pornos with the rest of the world, either. The pipes *are* getting fatter, but so are the files being sent across them.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
Wow one year? In net fads years, that is like what, middle aged?
I remember the old alpha days, where nothing would work and hardly anything would download, yup glad those days are gone.
--Joey