Mathematical Analysis of Gnutella
jrp2 sent in a paper written by one of Napster's founding engineers. It is
a mathematical evaluation of Gnutella discussing
why the network won't be able to scale up to any reasonable size. I
have been impressed with Gnutella in the past, and have wondered along
these same lines in the past.
The problem is not that difficult, if you want Gnutella to scale, then you need to avoid the exponential explosition of the number of messages exchanged between the clients as their number grows. The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way (this is an outrageous simplification of course, but gives an idea). There are many such expriments with Gnutella two with those ideas, this is what BearShare is trying to do.
Look at ICQ. It was fairly decent as an instant messaging client until the numbers hit one million or so and then it needed to control everything under the sun and companies could spam through it. File sharing happens through it all the time too.
I don't care if Gnutella cannot scale to the levels that Napster saw. Smaller is better in my opinion!
Click here or here.
On the topic of this program, a more current story running on msnbc.com right now is telling how it is becoming a severe security risk for users of the program. Here is the article.
Anyone who understands how Gnutella works (unfortunately, too few people) knows that Gnutella is horribly broken, will never work, and is basically unfixable.
The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".
However, the REAL question is whether you can have a peer-to-peer network with decentralized servers, i.e., with clients that automatically establish a heirarchy among all the clients, and certain clients become more "server like". They only way to make a Gnutella work is by making it heirarchical, but the heirarchy needs to be automatic for it have the same general "virtual network" aspect of Gnutella.
Is it possible? I don't know. You would probably have to have automatic bandwidth measurements, depth probes, all kinds of things to make it work. I simply don't know if it would be possible to automate something like that.
I'm not very familiar with the deep technical details of Gnutella, but isn't there a limit on how far the "horizon" is (i.e.:how many users near by you can see)? If this is correct, all the mathematics here presented apply only in theory and not in practice, as what will happen is that (1) most queries will not be relayed past a "reasonable horizon", and (2) there exists a good (or high?) probability that as long as you're searching for "popular" files, that you will eventually find them.
Because of this basic and simple observation, I do not foresee gnutella to die anytime soon because of scalability reasons alone (however copy-protection issues are another story).
Again let me stress that my observation here is based on the strong assumption that the "search horizon" is "reasonable sized" so as not to have to search the whole gnutella network.
its important to know that the author of this paper is Jordan Ritter, who is the co-founder of Napster.
-- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
How would a P2P with the scaling the likes of which IRC networks use?
Since I believe IRC scales pretty good why not build the Gnutella network like that?
Here's a really wacked-out thought I had that I've been working on.
Gnutella clients can sometimes have more "potential" connections out to the network than MAX_CONNECT (because they open, say five, expecting two and get four). If so, why not do a traceroute to each of the hosts and crop out the one that is the most hops away? Iterate cropping until there are MAX_CONNECT active connections.
This would tend to favor a network that closely reflected the underlying structure of the network - thus reducing any earth-shattering impact on the inet backbone?
To further force a short-inet-distance perhaps clients should (optionally) not accept connections from far-flung hosts?
Additionally, clients should count already-seen packets (which they are supposed to drop) against the goodness of a given link - thus reducing routing loops in the network and forcing it to flatten out instead of clump together.
These might allow clients to have a higher TTL without increasing net net (har har) bandwidth - less duplicated, circularly-routed, lengthy-path, etc, data.
I suspect (have not checked) that some clients do the latter (routing loop prevention), but I know of none doing the formers.
I will get around to coding this soon, unless somebody can tell me it's a stupid idea (for a good reason).
--Nathan
Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
and I hate to say this, but take an idea from the windows networking world... each machine has an election to see who is going to be the master browser (based on average connected and up times.. the clients that are up and serving the longest and with the shortest down times historically) then we have the next few building the same master browser database but sitting dormant (just listening and cacheing) until the master browser disappears, then the next highest pipes up and says "ohhh lookie me!" thus keeping a master server up (and that master server could load balance with the sub servers by just sending a "busy use 127.0.0.2 or 127.0.0.3" back to the client.
it could be fixed, and made powerful, self scaling.
Do not look at laser with remaining good eye.
A good analogy might be a detective trying to find a suspect for a crime. The Gnutella approach is akin to going on TV and asking everyone in the area to let you know if they know who did it. It may work once, but the more you do it, the less effective it is. Freenet works as detectives do normally, they gradually home in on their suspect by gathering information, and using that information to refine their search.
Some say that Freenet only achieves this scalability because it doesn't do the type of "fuzzy" search Gnutella does. You need to know exactly what you are looking for in Freenet to find it. This isn't true, the Freenet searching algorithm can be generalised to allow fuzzy searching. While this has not yet been demonstrated in practice, it is definitely possible in theory.
It always amazes me that people continue to lament flaws in many current P2P architectures when Freenet has incorporated solutions to those problems almost from its inception.
Disclaimer: I am Freenet's architect and project coordinator, so you could be forgiven for thinking I am biased, but you are free to review our papers and research to decide for yourself.
Has anyone considered that a transparent proxy might be the solution, or at least a partial solution?
The internet is more of a tree than a net, at least for the smaller ISP's. So a site can run a transparent proxy that aggregates all it's gnutella clients, and only maintain a few outbound connections for the entire site, as opposed to a few per client. In addition, incoming gnutella connections are intercepted and directed at the proxy (which is essentially another gnutella node).
This allows ISP's to limit the number of gnutella connections to the rest of the world. In fact, it would be best for them to connect only to other ISP's using a proxy as well.
This would tend to greatly improve query response time for nodes that are close by, but on the other hand would make it harder to create connections to remote nodes, because that control has been moved from the client to the proxy.
But an office or an net cafe or school could run the proxy and have a single link between it and the ISP's proxy, instantly connecting the site with all the ISP's users and cutting bandwidth considerably.
Proxy's can do other things to accelerate searches. If a request for "Grateful Dead" has been forwarded, then there is no need to forward the same query string in the immediate future (say 1 minute). And of course the is the option of caching the file transfers themselves...
Not only are they less likely to modify TTL, but the need to do so for popular content is not that necesary, the more popular the content the more people will download and share the download with others. Sort of a user based way of propogating media across the internet (while Freenet does it automaticly).
disclaimer : My views do not represent those of every one else in slashdot.
Not if you couldn't predict which machines on the net would act as those supernodes. If, like another poster mentioned, machines that met a certain criteria (bandwidth, storage, time on line, whatever) simply won an election to act as a node, there's no single point to shut down. Shut down one supernode, the others are informed that a replacement is needed and another election is called for.
:)
Following an election, the supernodes update the clients as to the lookup machines. I suppose you could even have it where if all the supernodes were shut down that an entirely new election process takes place creating a new set of supernodes. Kind of like having a DNS server setup where any machine can act as one of the root servers based on a criteria based election by those machines doing a lookup.
Way too much for my wee brain to work out all the details on. Sounds good in theory anyway
The line must be drawn here. This far. No further.