Slashdot Mirror


Mathematical Analysis of Gnutella

jrp2 sent in a paper written by one of Napster's founding engineers. It is a mathematical evaluation of Gnutella discussing why the network won't be able to scale up to any reasonable size. I have been impressed with Gnutella in the past, and have wondered along these same lines in the past.

11 of 332 comments (clear)

  1. Hitchhiker's Guide, Part II. by Tackhead · · Score: 5, Funny
    Earth: Mostly harmless.

    Napster: Sucks ass.

    Gnutella: Doesn't scale.

    (Mod my ass as Flamebait for this, but didn't everyone know about Gnutella's scaling problems, and for-pay Napster sucking ass, based on Slashdot stories months and weeks before today?)

  2. Ancient news. by RareHeintz · · Score: 5, Informative
    This story is over 11 months old!

    I mean, I know that none of us - including our fine moderators - are perfect, but are they at least paying attention?

    OK,
    - B

  3. What the hell? by avalys · · Score: 5, Funny

    Heh...somehow I read the title as "Mathematical Analysis of Guatemala", since this article has been posted before and Slashdot never posts anything twice.

    --
    This space intentionally left blank.
  4. The Logarithmic value of the messages exchanged ! by Khalid · · Score: 5, Interesting

    The problem is not that difficult, if you want Gnutella to scale, then you need to avoid the exponential explosition of the number of messages exchanged between the clients as their number grows. The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way (this is an outrageous simplification of course, but gives an idea). There are many such expriments with Gnutella two with those ideas, this is what BearShare is trying to do.

  5. Gnutella's spawn by PureFiction · · Score: 5, Informative

    What I find most interesting are the kinds of projects that have sprung up in Gnutella's wake. Many of these started out as attempts to improve Gnutella, and have since moved on (the Gnutella Next Generation working group never really materialized into anything)

    We had napster and one extreme, gnutella at the other, and in the middle a re a number of partially centralized systems with super peers like Fast Track, such as:
    Open FT
    JXTA Search
    GNet
    NEShare

    and many others...

    Then there are the alternative projects that use an entirely different mechanism. For example, social discovery as implemented in:
    NeuroGrid
    ALPINE

    Or distributed keyword hash indexes like:
    Chord
    Circle
    GISP
    JXTA Distributed Indexing

    And many others as well.

    The coming year(s) will see a lot of maturity in these areas, and searching large peer networks will become ever more efficient over time. Gnutella showed us the possibilities of a fully decentralized model, and refinements of its underlying architecture can produce vastly better solutions.

    2002 will be an interesting year for peer networking applications...

  6. Re:The Logarithmic value of the messages exchanged by zerocool^ · · Score: 5, Informative


    The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished...

    This is exactly the point. This is the only way to properly distribute querys, as anyone who has set up a multi-homed ISP knows. It works on the same principle as BGP routing, i.e. there are routers (super-nodes, or whatever) that have a specific number (an ASN - or in P2P, the supernode address) but there are thousands of computers (casual modem users - p2p) on the internet that these routers have information about. If BGP routing worked this way, nothing would go anywhere. However, by having several nodes giving out information on who has what and how to get it, while the majority of users just download and give out their own info, not pass along info of others, things work much smoother. And with a correct implementation, everyone could have a route to everyone's file list at a minimal bandwidth useage.

    --
    sig?
  7. Even worse! Incosistent math! by hodeleri · · Score: 5, Informative

    If you read through this research paper it'll start with N=4 and T=5. As you continue to read through the paper he quotes bandwidth figures from his table using various other N and T values.

    For example, in the very last table (Bandwidth rates for 10qps) he says the bandwidth generated will by 8GB/s, which align with N=8, T=7. Where you to use the N and T values from the beginning, this would be 2.4MB/s, which is off by 3143 and one third times.

    Going back to Joe User's Greatful Dead query, it only generates ~250KB, not 800MB.

    Remember, very very few people are going to modify their TTL or open connections. This ``white paper'' grossly misstates the amount of bandwidth Gnutella generates and seems to be an anti-Gnutalla paper designed to mislead rather than an honest and fair judgment

  8. This is well and truely FUD. by BeBoxer · · Score: 5, Insightful

    As I pointed out last time this was posted, this article is basically 100% FUD. Yes, the amount of traffic goes up. And no, gnutella doesn't scale very well. But the author goes out of his way to make the problem look worse than it actually is. You see, the article only computes the total amount of traffic in the entire network. A number which is both huge and meaningless. You see, by this math, if I send a packet somewhere and it takes 10 hops, well, thats like sending 10 packets!

    At the end of the paper, the author coughs up the big scary number of 63GBps of traffic in the Gnutella network when the nodes each have 8 connections and are using a TTL of 8. Wow! That's a lot of traffic. That certainly isn't scaling! Well, what the author never points out is that, by his own math, the network has 7,686,400 users at this point! When we divide up the total traffic among all of those network links, we get a different view. If you do the math you discover that this is a whopping 72Kbps! Oh no! It's the end of the world! Well, no, it's not. True, it's more than a modem can handle. But it's well within the reach of most cable modem connections. Given that your computer is being expected to handle the search requests of over 7 million other people, it's not that much traffic.

    Don't get me wrong, I agree that Gnutella doesn't scale all that well. But this paper is just plain FUD. The only number that really matters to users is the total bandwidth load on their pipe. By carefully avoiding that number, which isn't very big and scary at all, the auther is clearly lying by ommision. Given all of the real problems networks like Gnutella encounter, it isn't interesting to read this sort of drivel. Why don't we drag out Mathmatical and model how much bandwidth Napster wastes by transmitting the names of all the files being shared even though most of them will never get searched for. Hmmm. lets assume 7,000,000 users. Let's assume that they each share 1000 files with an average filename length of 32 characters. Why, that's 224 Gigabytes of data, and we haven't even done any searches yet! Cleary, Napster doesn't scale. Ugh. This guy might know how to use Mathematica, but I still suspect he worked in the Marketing department. With the same guys who will tell you about their 200Mbps fast ethernet.

  9. Re:The Logarithmic value of the messages exchanged by Patrick · · Score: 5, Informative
    The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way

    That's not logarithmic. If every client node connects to a "super node," and every other "super node," then what you have is a two-level tree. Growth at each level is O(sqrt{n}), not logarithmic.

    Chord, a p2p research project from MIT, is truly logarithmic. Go read their SIGCOMM'01 paper for an explanation of how their system works.

    --Patrick

  10. And there's room for improvement, no less... by DarkEdgeX · · Score: 5, Informative

    Plus the author ignores (mostly due to the fact that they didn't exist back when it was written, this IS an old article) the innovations made with Gnutella (and other, newer competing technologies). Specifically, there are now 'search proxies' that exist on Gnutella that cache and return common queries, thus not saturating the network with redundant queries. For a modem user, this makes the network usable if they limit their connections to proxy servers, because the number of searches hitting their client directly shrinks as common queries are sifted through.

    Not to mention there's still room for improvement to the protocol itself-- there's no reason a proxy couldn't cache a list of all files shared by a connected client, then answer queries directly, NEVER sending a query directly to a client. (Ultimately, as people run proxies like this more and more, you'd end up having proxies talking directly to eachother.) The ultimate Gnutella proxy would cache commonly requested files and make them available over a bigger pipe.

    No money in it, but for the Gnutella enthusiast, I could see them running this kind of thing from work off of a QA box, for example, or from their support desk at an ISP. =)

    --
    All I know about Bush is I had a good job when Clinton was president.
  11. Totally NOT true!!! by Jagasian · · Score: 5, Informative

    Obviously you haven't used GNUtella for the past year. Xolox is a GNUtella client that allows for parallel downloading, resuming, and Xolox will even look for other sources of the file that you are currently downloading, if the current sources are too slow or down. Basically, with Xolox, you search for a file that you want, and you get results with numbers by them depicting how many sources have the file. That way you don't have to decide which source you want to download from. You decide which file you want to download... and Xolox figures out the rest.

    My average download speeds on Xolox are around 160Mbs. Of course, I am use the ever so crappy AT&T cable modem service... so other people on faster DSL lines will most likely experience faster downloads.

    Next thing you are going to tell me is that Windows is better than Linux because Linux doesn't have any good GUIs or desktop environments for it. Yeah, lets just ignore everything thats out there right now.

    Not only that, but Limewire also supports multisource, segmented, or swarmed downloading. Though Limewire has only recently gotten such functionality, while Xolox has had it for the past year.

    Oh, and GNUtella is free as in beer and as in speech.