Mathematical Analysis of Gnutella
jrp2 sent in a paper written by one of Napster's founding engineers. It is
a mathematical evaluation of Gnutella discussing
why the network won't be able to scale up to any reasonable size. I
have been impressed with Gnutella in the past, and have wondered along
these same lines in the past.
haven't we been here before?
I mean, I know that none of us - including our fine moderators - are perfect, but are they at least paying attention?
OK,
- B
http://www.bradheintz.com/
- updated
What I find most interesting are the kinds of projects that have sprung up in Gnutella's wake. Many of these started out as attempts to improve Gnutella, and have since moved on (the Gnutella Next Generation working group never really materialized into anything)
We had napster and one extreme, gnutella at the other, and in the middle a re a number of partially centralized systems with super peers like Fast Track, such as:
Open FT
JXTA Search
GNet
NEShare
and many others...
Then there are the alternative projects that use an entirely different mechanism. For example, social discovery as implemented in:
NeuroGrid
ALPINE
Or distributed keyword hash indexes like:
Chord
Circle
GISP
JXTA Distributed Indexing
And many others as well.
The coming year(s) will see a lot of maturity in these areas, and searching large peer networks will become ever more efficient over time. Gnutella showed us the possibilities of a fully decentralized model, and refinements of its underlying architecture can produce vastly better solutions.
2002 will be an interesting year for peer networking applications...
Maybe you should be trying the Google cache!
314-15-9265
The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished...
This is exactly the point. This is the only way to properly distribute querys, as anyone who has set up a multi-homed ISP knows. It works on the same principle as BGP routing, i.e. there are routers (super-nodes, or whatever) that have a specific number (an ASN - or in P2P, the supernode address) but there are thousands of computers (casual modem users - p2p) on the internet that these routers have information about. If BGP routing worked this way, nothing would go anywhere. However, by having several nodes giving out information on who has what and how to get it, while the majority of users just download and give out their own info, not pass along info of others, things work much smoother. And with a correct implementation, everyone could have a route to everyone's file list at a minimal bandwidth useage.
sig?
Speaking of the merits of Fasttrack, it also provides robust parallel download and auto-resume. Using Gnutella is painful by comparison.
If you read through this research paper it'll start with N=4 and T=5. As you continue to read through the paper he quotes bandwidth figures from his table using various other N and T values.
For example, in the very last table (Bandwidth rates for 10qps) he says the bandwidth generated will by 8GB/s, which align with N=8, T=7. Where you to use the N and T values from the beginning, this would be 2.4MB/s, which is off by 3143 and one third times.
Going back to Joe User's Greatful Dead query, it only generates ~250KB, not 800MB.
Remember, very very few people are going to modify their TTL or open connections. This ``white paper'' grossly misstates the amount of bandwidth Gnutella generates and seems to be an anti-Gnutalla paper designed to mislead rather than an honest and fair judgment
marotti.com
There were several responces to this article pointing out that the current Gnutella network is much more scalable than the one discussed in the article. Try looking here and here for articles discussing the changes since early 2000.
Come on Slashdot, its 2002 not 2000. It looks pretty bad accepting this article right after the Napster one. Does Slashdot or VA own a stake in Napster or something?
LimeWire currently implements a variation of this -- what we call "UltraPeers." UltraPeers establish a significantly greater horizon on the network, and there are other distributed protocols that do this in other creative ways, such as Chord out of MIT, which can be found at: http://www.pdos.lcs.mit.edu/chord/ That aside, there is significant evidence to show that a distributed network can scale far better than any centralized network. Remember that Napster had serious scaling problems as well -- you could only see the files from the hosts on whichever server you happened to be logged in to. The only solution to that problem is the brain-dead purchase of a yet faster multi-million dollar server. I would not call that scaling. As everyone else has pointed out, this discussion began and ended in the Gnutella community about a year ago.
Adam Fisk
That's not logarithmic. If every client node connects to a "super node," and every other "super node," then what you have is a two-level tree. Growth at each level is O(sqrt{n}), not logarithmic.
Chord, a p2p research project from MIT, is truly logarithmic. Go read their SIGCOMM'01 paper for an explanation of how their system works.
--Patrick
Plus the author ignores (mostly due to the fact that they didn't exist back when it was written, this IS an old article) the innovations made with Gnutella (and other, newer competing technologies). Specifically, there are now 'search proxies' that exist on Gnutella that cache and return common queries, thus not saturating the network with redundant queries. For a modem user, this makes the network usable if they limit their connections to proxy servers, because the number of searches hitting their client directly shrinks as common queries are sifted through.
Not to mention there's still room for improvement to the protocol itself-- there's no reason a proxy couldn't cache a list of all files shared by a connected client, then answer queries directly, NEVER sending a query directly to a client. (Ultimately, as people run proxies like this more and more, you'd end up having proxies talking directly to eachother.) The ultimate Gnutella proxy would cache commonly requested files and make them available over a bigger pipe.
No money in it, but for the Gnutella enthusiast, I could see them running this kind of thing from work off of a QA box, for example, or from their support desk at an ISP. =)
All I know about Bush is I had a good job when Clinton was president.
Obviously you haven't used GNUtella for the past year. Xolox is a GNUtella client that allows for parallel downloading, resuming, and Xolox will even look for other sources of the file that you are currently downloading, if the current sources are too slow or down. Basically, with Xolox, you search for a file that you want, and you get results with numbers by them depicting how many sources have the file. That way you don't have to decide which source you want to download from. You decide which file you want to download... and Xolox figures out the rest.
My average download speeds on Xolox are around 160Mbs. Of course, I am use the ever so crappy AT&T cable modem service... so other people on faster DSL lines will most likely experience faster downloads.
Next thing you are going to tell me is that Windows is better than Linux because Linux doesn't have any good GUIs or desktop environments for it. Yeah, lets just ignore everything thats out there right now.
Not only that, but Limewire also supports multisource, segmented, or swarmed downloading. Though Limewire has only recently gotten such functionality, while Xolox has had it for the past year.
Oh, and GNUtella is free as in beer and as in speech.
I was like this for about a week before I realized why I wasn't getting any uploads. I had to open up port 6346 on my home network (linksys router). Also, Napshare lets me "force local ip" to my firewall/ external ip (assigned by RoadRunner). The linksys router does port forwarding on outside requests, so only one computer on my home network can share on that port.
This thread reminded me that RoadRunner had expired my old ip address and assigned me another and I had forgotten to update my gnutella client to reflect to new ip. So for the past few weeks or so, I had been one of the "non-sharing" people by simple oversight.
I doubt most limewire/bearshare users know any of this stuff. When running a gnutella client from work, people couldn't do this even if knew about it and wanted to.
Software Wars
It's worth noting that giFT/OpenFT just entered its first stage of network testing--and with that in mind, they need as many people as possible to download and run the client so they can test the network. Complete instructions for so doing are given on the website.
Editor Emeritus and Senior Writer, TeleRead.org
This sig under construction. Please check back later.
Thats exactly how eDonkey2000 works!
While your downloading a file, it's immediately made available for upload from you. It uses resume download to download parts of the file you want from multiple sources, some of which don't have all of the file yet too.
I live life on the edge
The next step is to add more sophisticated routing protocols between ultrapeers. Many of the algorithms mentioned elsewhere in this post (Chord, CAN, etc.) are contenders for that, as is LimeWire's home-grown query-routing proposal.
Christopher Rohrs
LimeWire