Slashdot Mirror


Mathematical Analysis of Gnutella

jrp2 sent in a paper written by one of Napster's founding engineers. It is a mathematical evaluation of Gnutella discussing why the network won't be able to scale up to any reasonable size. I have been impressed with Gnutella in the past, and have wondered along these same lines in the past.

19 of 332 comments (clear)

  1. old news by Silver+A · · Score: 4, Informative
  2. Ancient news. by RareHeintz · · Score: 5, Informative
    This story is over 11 months old!

    I mean, I know that none of us - including our fine moderators - are perfect, but are they at least paying attention?

    OK,
    - B

  3. Gnutella's spawn by PureFiction · · Score: 5, Informative

    What I find most interesting are the kinds of projects that have sprung up in Gnutella's wake. Many of these started out as attempts to improve Gnutella, and have since moved on (the Gnutella Next Generation working group never really materialized into anything)

    We had napster and one extreme, gnutella at the other, and in the middle a re a number of partially centralized systems with super peers like Fast Track, such as:
    Open FT
    JXTA Search
    GNet
    NEShare

    and many others...

    Then there are the alternative projects that use an entirely different mechanism. For example, social discovery as implemented in:
    NeuroGrid
    ALPINE

    Or distributed keyword hash indexes like:
    Chord
    Circle
    GISP
    JXTA Distributed Indexing

    And many others as well.

    The coming year(s) will see a lot of maturity in these areas, and searching large peer networks will become ever more efficient over time. Gnutella showed us the possibilities of a fully decentralized model, and refinements of its underlying architecture can produce vastly better solutions.

    2002 will be an interesting year for peer networking applications...

  4. Slashdotted? by Eric+Seppanen · · Score: 2, Informative
    Are you feeling Slashdotted? Server got that not-so-fast feeling?

    Maybe you should be trying the Google cache!

    --
    314-15-9265
  5. Re:The Logarithmic value of the messages exchanged by zerocool^ · · Score: 5, Informative


    The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished...

    This is exactly the point. This is the only way to properly distribute querys, as anyone who has set up a multi-homed ISP knows. It works on the same principle as BGP routing, i.e. there are routers (super-nodes, or whatever) that have a specific number (an ASN - or in P2P, the supernode address) but there are thousands of computers (casual modem users - p2p) on the internet that these routers have information about. If BGP routing worked this way, nothing would go anywhere. However, by having several nodes giving out information on who has what and how to get it, while the majority of users just download and give out their own info, not pass along info of others, things work much smoother. And with a correct implementation, everyone could have a route to everyone's file list at a minimal bandwidth useage.

    --
    sig?
  6. Re:This is old news. by RovingSlug · · Score: 4, Informative

    Speaking of the merits of Fasttrack, it also provides robust parallel download and auto-resume. Using Gnutella is painful by comparison.

  7. Even worse! Incosistent math! by hodeleri · · Score: 5, Informative

    If you read through this research paper it'll start with N=4 and T=5. As you continue to read through the paper he quotes bandwidth figures from his table using various other N and T values.

    For example, in the very last table (Bandwidth rates for 10qps) he says the bandwidth generated will by 8GB/s, which align with N=8, T=7. Where you to use the N and T values from the beginning, this would be 2.4MB/s, which is off by 3143 and one third times.

    Going back to Joe User's Greatful Dead query, it only generates ~250KB, not 800MB.

    Remember, very very few people are going to modify their TTL or open connections. This ``white paper'' grossly misstates the amount of bandwidth Gnutella generates and seems to be an anti-Gnutalla paper designed to mislead rather than an honest and fair judgment

    1. Re:Even worse! Incosistent math! by Alfred · · Score: 2, Informative

      You have missed the point of the paper. The N=4 and T=5 values equate to a maximum reach of 484 clients for a query. To reach "napster" sizes of 1million you need T=7 and N=8.
      That explains the different numbers.

      However, I do agree that a couple numbers seem to be plucked from mid-air, but the argument and maths seems fine :)

    2. Re:Even worse! Incosistent math! by ajknott · · Score: 2, Informative

      I thought this was an error originally as well but it is not. When you have 2 connetsions (N=2 globally) and a node sends a request to the first two nodes, each of those nodes has 2 connections, one to the original requesting node and one to a new node. Thus the second level nodes can only send the request to 1 new node each.

  8. Not only is this old, it is outdated by codemachine · · Score: 3, Informative

    There were several responces to this article pointing out that the current Gnutella network is much more scalable than the one discussed in the article. Try looking here and here for articles discussing the changes since early 2000.

    Come on Slashdot, its 2002 not 2000. It looks pretty bad accepting this article right after the Napster one. Does Slashdot or VA own a stake in Napster or something?

  9. Re:The Logarithmic value of the messages exchanged by Adam+Fisk · · Score: 2, Informative

    LimeWire currently implements a variation of this -- what we call "UltraPeers." UltraPeers establish a significantly greater horizon on the network, and there are other distributed protocols that do this in other creative ways, such as Chord out of MIT, which can be found at: http://www.pdos.lcs.mit.edu/chord/ That aside, there is significant evidence to show that a distributed network can scale far better than any centralized network. Remember that Napster had serious scaling problems as well -- you could only see the files from the hosts on whichever server you happened to be logged in to. The only solution to that problem is the brain-dead purchase of a yet faster multi-million dollar server. I would not call that scaling. As everyone else has pointed out, this discussion began and ended in the Gnutella community about a year ago.

    --

    Adam Fisk

  10. Re:The Logarithmic value of the messages exchanged by Patrick · · Score: 5, Informative
    The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way

    That's not logarithmic. If every client node connects to a "super node," and every other "super node," then what you have is a two-level tree. Growth at each level is O(sqrt{n}), not logarithmic.

    Chord, a p2p research project from MIT, is truly logarithmic. Go read their SIGCOMM'01 paper for an explanation of how their system works.

    --Patrick

  11. And there's room for improvement, no less... by DarkEdgeX · · Score: 5, Informative

    Plus the author ignores (mostly due to the fact that they didn't exist back when it was written, this IS an old article) the innovations made with Gnutella (and other, newer competing technologies). Specifically, there are now 'search proxies' that exist on Gnutella that cache and return common queries, thus not saturating the network with redundant queries. For a modem user, this makes the network usable if they limit their connections to proxy servers, because the number of searches hitting their client directly shrinks as common queries are sifted through.

    Not to mention there's still room for improvement to the protocol itself-- there's no reason a proxy couldn't cache a list of all files shared by a connected client, then answer queries directly, NEVER sending a query directly to a client. (Ultimately, as people run proxies like this more and more, you'd end up having proxies talking directly to eachother.) The ultimate Gnutella proxy would cache commonly requested files and make them available over a bigger pipe.

    No money in it, but for the Gnutella enthusiast, I could see them running this kind of thing from work off of a QA box, for example, or from their support desk at an ISP. =)

    --
    All I know about Bush is I had a good job when Clinton was president.
  12. Totally NOT true!!! by Jagasian · · Score: 5, Informative

    Obviously you haven't used GNUtella for the past year. Xolox is a GNUtella client that allows for parallel downloading, resuming, and Xolox will even look for other sources of the file that you are currently downloading, if the current sources are too slow or down. Basically, with Xolox, you search for a file that you want, and you get results with numbers by them depicting how many sources have the file. That way you don't have to decide which source you want to download from. You decide which file you want to download... and Xolox figures out the rest.

    My average download speeds on Xolox are around 160Mbs. Of course, I am use the ever so crappy AT&T cable modem service... so other people on faster DSL lines will most likely experience faster downloads.

    Next thing you are going to tell me is that Windows is better than Linux because Linux doesn't have any good GUIs or desktop environments for it. Yeah, lets just ignore everything thats out there right now.

    Not only that, but Limewire also supports multisource, segmented, or swarmed downloading. Though Limewire has only recently gotten such functionality, while Xolox has had it for the past year.

    Oh, and GNUtella is free as in beer and as in speech.

  13. Re:Choice by mshiltonj · · Score: 2, Informative
    When searching for content on the gnutella network, a lot of the results come back showing the host as 192.168.* 10.0.*, which means they are behind a firewall or otherwise not directly routable. In such a case, the user may be unable to correct it, or unaware of the issue entirely

    I was like this for about a week before I realized why I wasn't getting any uploads. I had to open up port 6346 on my home network (linksys router). Also, Napshare lets me "force local ip" to my firewall/ external ip (assigned by RoadRunner). The linksys router does port forwarding on outside requests, so only one computer on my home network can share on that port.

    This thread reminded me that RoadRunner had expired my old ip address and assigned me another and I had forgotten to update my gnutella client to reflect to new ip. So for the past few weeks or so, I had been one of the "non-sharing" people by simple oversight.

    I doubt most limewire/bearshare users know any of this stuff. When running a gnutella client from work, people couldn't do this even if knew about it and wanted to.

    There is a major flaw in all P2P software, and it has nothing to do with the coding. More people tend to want to take than recieve. I remember seeing a line graph on LimeWire's page (I think?) that showed a monthly progression of the number of people sharing files compared to thenumber of people downloading files. The 'downloaders' were outweighing the 'uploaders' by a HUGE ammount.

    If everyone was willing to share their files, then there would be no such problem with P2P programs.
  14. giFT enters network testing by Robotech_Master · · Score: 3, Informative

    It's worth noting that giFT/OpenFT just entered its first stage of network testing--and with that in mind, they need as many people as possible to download and run the client so they can test the network. Complete instructions for so doing are given on the website.

    --
    Editor Emeritus and Senior Writer, TeleRead.org
  15. Too late... by vrt3 · · Score: 3, Informative
    From http://www.xolox.nl:
    Dear XoloX-user, Taking into account the latest law suits against p2p clients based on Fasttrack-technology (such as Kazaa), we have decided to discontinue XoloX. As of the 1st of december, XoloX will be shut down and removed from distribution sites. We hope everybody has enjoyed XoloX as long as it has been around and we want to use this opportunity to thank everybody who made a contribution to its development. These last few days will give you some time to finish your downloads and we advise you not to start new transfers. Thanks again and goodbye! --Team XoloX--
    --
    This sig under construction. Please check back later.
  16. Re:Choice by Johnny00 · · Score: 2, Informative

    Thats exactly how eDonkey2000 works!

    While your downloading a file, it's immediately made available for upload from you. It uses resume download to download parts of the file you want from multiple sources, some of which don't have all of the file yet too.

    --
    I live life on the edge ... of my desk.
  17. Re:The Logarithmic value of the messages exchanged by chrohrs · · Score: 2, Informative
    LimeWire has attacked this problem by introducing "ultrapeers", which offload most of the bandwidth to a small subset of hosts. It works really well. Unlike FastTrack, this is an open-protocol with an open-source implementation available.

    The next step is to add more sophisticated routing protocols between ultrapeers. Many of the algorithms mentioned elsewhere in this post (Chord, CAN, etc.) are contenders for that, as is LimeWire's home-grown query-routing proposal.

    Christopher Rohrs
    LimeWire