Slashdot Mirror


Mathematical Analysis of Gnutella

jrp2 sent in a paper written by one of Napster's founding engineers. It is a mathematical evaluation of Gnutella discussing why the network won't be able to scale up to any reasonable size. I have been impressed with Gnutella in the past, and have wondered along these same lines in the past.

14 of 332 comments (clear)

  1. The Logarithmic value of the messages exchanged ! by Khalid · · Score: 5, Interesting

    The problem is not that difficult, if you want Gnutella to scale, then you need to avoid the exponential explosition of the number of messages exchanged between the clients as their number grows. The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way (this is an outrageous simplification of course, but gives an idea). There are many such expriments with Gnutella two with those ideas, this is what BearShare is trying to do.

  2. Growth of network relates to negative attention by totallygeek · · Score: 3, Interesting
    Everyone knows that Napster was basically a glorified DCC engine rip-off from IRC days of file trading. It made IRC file sharing easy for the average computer user. With the death of Napster as everyone knew it, you still see #mp3 and #mp3tunes and the like on IRC trading files person-to-person like Metallica never existed. I think that when something explodes in popularity you get too many bad people joining in ruining things for the users that are not abusers. When so many people jump on a bandwagon, you get media attention for wrong-doing and that is where the death nail is driven.


    Look at ICQ. It was fairly decent as an instant messaging client until the numbers hit one million or so and then it needed to control everything under the sun and companies could spam through it. File sharing happens through it all the time too.


    I don't care if Gnutella cannot scale to the levels that Napster saw. Smaller is better in my opinion!

    1. Re:Growth of network relates to negative attention by vawlk · · Score: 2, Interesting

      Each request generates a GUID and clients check each request vs a recent history of GUIDs. It prevents any sort of looping.

      Having developed the first host caching application for gnutella, I can say that the author never fully understood how the network worked.

      His equations may be accurate based on how he thought requests and replied propogated through the network, but he assumed every request had a reply.

      It is true that the bandwidth overhead was large, but I rarely used more than 15KB/s during the times when there were 4000+ clients connected. He says that it might not be possible to reach all 4000 people, but in order for me to know how many users were out there, they all replied to my ping, thus searchable.

      Finally, the very nature of the network doesn't lend itself to protocol updates at all. The protocol was extreamly limited, but once it caught on, not much could be done about updating it short of starting an entirly new protocol. You couldn't just shut it down, and thats the major problem.

      Many proposals were written on how to implement a system without the gnutella limitations, and you are seeing them in many different implementations.

  3. gnutella by flynt · · Score: 3, Interesting

    On the topic of this program, a more current story running on msnbc.com right now is telling how it is becoming a severe security risk for users of the program. Here is the article.

  4. Well, duh by Anonymous Coward · · Score: 1, Interesting

    Anyone who understands how Gnutella works (unfortunately, too few people) knows that Gnutella is horribly broken, will never work, and is basically unfixable.

    The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".

    However, the REAL question is whether you can have a peer-to-peer network with decentralized servers, i.e., with clients that automatically establish a heirarchy among all the clients, and certain clients become more "server like". They only way to make a Gnutella work is by making it heirarchical, but the heirarchy needs to be automatic for it have the same general "virtual network" aspect of Gnutella.

    Is it possible? I don't know. You would probably have to have automatic bandwidth measurements, depth probes, all kinds of things to make it work. I simply don't know if it would be possible to automate something like that.

  5. Is there a limit to the gnutella horizon? by eyefish · · Score: 3, Interesting

    I'm not very familiar with the deep technical details of Gnutella, but isn't there a limit on how far the "horizon" is (i.e.:how many users near by you can see)? If this is correct, all the mathematics here presented apply only in theory and not in practice, as what will happen is that (1) most queries will not be relayed past a "reasonable horizon", and (2) there exists a good (or high?) probability that as long as you're searching for "popular" files, that you will eventually find them.

    Because of this basic and simple observation, I do not foresee gnutella to die anytime soon because of scalability reasons alone (however copy-protection issues are another story).

    Again let me stress that my observation here is based on the strong assumption that the "search horizon" is "reasonable sized" so as not to have to search the whole gnutella network.

  6. who is the author of this paper? by mozkill · · Score: 3, Interesting

    its important to know that the author of this paper is Jordan Ritter, who is the co-founder of Napster.

    --

    -- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
  7. A possible solution to the scaling problem by Bake · · Score: 2, Interesting

    How would a P2P with the scaling the likes of which IRC networks use?

    Since I believe IRC scales pretty good why not build the Gnutella network like that?

  8. New algorithm needed at the connect phase? by _Knots · · Score: 2, Interesting

    Here's a really wacked-out thought I had that I've been working on.

    Gnutella clients can sometimes have more "potential" connections out to the network than MAX_CONNECT (because they open, say five, expecting two and get four). If so, why not do a traceroute to each of the hosts and crop out the one that is the most hops away? Iterate cropping until there are MAX_CONNECT active connections.

    This would tend to favor a network that closely reflected the underlying structure of the network - thus reducing any earth-shattering impact on the inet backbone?

    To further force a short-inet-distance perhaps clients should (optionally) not accept connections from far-flung hosts?

    Additionally, clients should count already-seen packets (which they are supposed to drop) against the goodness of a given link - thus reducing routing loops in the network and forcing it to flatten out instead of clump together.

    These might allow clients to have a higher TTL without increasing net net (har har) bandwidth - less duplicated, circularly-routed, lengthy-path, etc, data.

    I suspect (have not checked) that some clients do the latter (routing loop prevention), but I know of none doing the formers.

    I will get around to coding this soon, unless somebody can tell me it's a stupid idea (for a good reason).

    --Nathan

    --
    Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
  9. Re:The Logarithmic value of the messages exchanged by Lumpy · · Score: 3, Interesting

    and I hate to say this, but take an idea from the windows networking world... each machine has an election to see who is going to be the master browser (based on average connected and up times.. the clients that are up and serving the longest and with the shortest down times historically) then we have the next few building the same master browser database but sitting dormant (just listening and cacheing) until the master browser disappears, then the next highest pipes up and says "ohhh lookie me!" thus keeping a master server up (and that master server could load balance with the sub servers by just sending a "busy use 127.0.0.2 or 127.0.0.3" back to the client.

    it could be fixed, and made powerful, self scaling.

    --
    Do not look at laser with remaining good eye.
  10. Freenet has addressed this issue from day one by Sanity · · Score: 3, Interesting
    The scalability issues with Gnutella are clear to anyone who understands how it works. From day one, Freenet was designed with scalability as a core goal. In Freenet, the number of nodes involved, and the time required to retrieve a piece of information, scales logarithmically as the size of the network increases.

    A good analogy might be a detective trying to find a suspect for a crime. The Gnutella approach is akin to going on TV and asking everyone in the area to let you know if they know who did it. It may work once, but the more you do it, the less effective it is. Freenet works as detectives do normally, they gradually home in on their suspect by gathering information, and using that information to refine their search.

    Some say that Freenet only achieves this scalability because it doesn't do the type of "fuzzy" search Gnutella does. You need to know exactly what you are looking for in Freenet to find it. This isn't true, the Freenet searching algorithm can be generalised to allow fuzzy searching. While this has not yet been demonstrated in practice, it is definitely possible in theory.

    It always amazes me that people continue to lament flaws in many current P2P architectures when Freenet has incorporated solutions to those problems almost from its inception.

    Disclaimer: I am Freenet's architect and project coordinator, so you could be forgiven for thinking I am biased, but you are free to review our papers and research to decide for yourself.

  11. Transparent Proxy by Uncle+Dazza · · Score: 2, Interesting

    Has anyone considered that a transparent proxy might be the solution, or at least a partial solution?

    The internet is more of a tree than a net, at least for the smaller ISP's. So a site can run a transparent proxy that aggregates all it's gnutella clients, and only maintain a few outbound connections for the entire site, as opposed to a few per client. In addition, incoming gnutella connections are intercepted and directed at the proxy (which is essentially another gnutella node).

    This allows ISP's to limit the number of gnutella connections to the rest of the world. In fact, it would be best for them to connect only to other ISP's using a proxy as well.

    This would tend to greatly improve query response time for nodes that are close by, but on the other hand would make it harder to create connections to remote nodes, because that control has been moved from the client to the proxy.

    But an office or an net cafe or school could run the proxy and have a single link between it and the ISP's proxy, instantly connecting the site with all the ISP's users and cutting bandwidth considerably.

    Proxy's can do other things to accelerate searches. If a request for "Grateful Dead" has been forwarded, then there is no need to forward the same query string in the immediate future (say 1 minute). And of course the is the option of caching the file transfers themselves...

  12. Re:Even worse! Incosistent math! by AbsoluteRelativity · · Score: 2, Interesting

    Not only are they less likely to modify TTL, but the need to do so for popular content is not that necesary, the more popular the content the more people will download and share the download with others. Sort of a user based way of propogating media across the internet (while Freenet does it automaticly).

    --
    disclaimer : My views do not represent those of every one else in slashdot.
  13. Re:The Logarithmic value of the messages exchanged by Metrol · · Score: 3, Interesting

    Not if you couldn't predict which machines on the net would act as those supernodes. If, like another poster mentioned, machines that met a certain criteria (bandwidth, storage, time on line, whatever) simply won an election to act as a node, there's no single point to shut down. Shut down one supernode, the others are informed that a replacement is needed and another election is called for.

    Following an election, the supernodes update the clients as to the lookup machines. I suppose you could even have it where if all the supernodes were shut down that an entirely new election process takes place creating a new set of supernodes. Kind of like having a DNS server setup where any machine can act as one of the root servers based on a criteria based election by those machines doing a lookup.

    Way too much for my wee brain to work out all the details on. Sounds good in theory anyway :)

    --
    The line must be drawn here. This far. No further.