Slashdot Mirror


Gnutella's Challenge

Gnutella News sent in an excerpt from a clip2 DSS report about gnutella's evolution and condition. "the network has neither smoothly scaled nor catastrophically collapsed since average traffic grew to regularly exceed dial-up modem bandwidth in August 2000. Instead, the network persists in a fragmented state comprised of numerous continuously evolving responsive segments, the largest of which typically contains hundreds of hosts. We estimate at present that unique Gnutella users per day number no less than 10,000 and may range as high as 30,000. We suggest that further technical innovation and wide adoption of this innovation are necessary for the Gnutella network to scale beyond its present state."" Read this if you're interested in p2p [?] .

19 of 96 comments (clear)

  1. Re:Idea -- SOLUTION by Anonymous Coward · · Score: 3
    I don't think that any of these solves the issue that the gnutella protocol does not scale in it current iteration. Scalability is the real issue. Here are some ideas that would improve the scalability.

    Bandwidth Limited Connections. When two gnutella clients connect they should send in the reply the allocated bandwidth for that connection. The forwarding protocol should not allow more queries to be sent than the bandwidth allows. This would require a form of russian roulette on the packets--a method of killing queries.

    It is feasable that the client should be able to forward post and response packets. Query packets are the most likely target for such filters. The filters could be implemented in several manners:

    • The choosing is totally random.
    • An artificialy intelligent option that learns the requests that have been handled and which have not would allow filtering of requests where the files are more readily available.
    • An filtering option that may be unpopular with script kiddies (in fact, probably riled as sensorship), but popular with older or more mature individuals would perhaps place killfilters on obscene queries that enter your computer. After all, there is no reason that someone who doesn't agree with some actions must facilitate them with his/her own computer.
    I would like to see the analysis of the different queries sent over the network, and some kind of user connectivity.

    Packet filtering would help to solve the current protocol limitation. Since the network is totally connected it would change the dynamics of the gnutella network and make it a more connected place even though there is a higher probability that the queries are never answered.

  2. True P2P Can't Scale? Take a look at Freenet by Sanity · · Score: 3
    Take a look at these simulations of Freenet's reliability and performance as the network size increases. You will notice that once the network stabalizes the network's size has little bearing on the time required to retrieve a piece of data. Other experiments (not yet published) have demonstrated that Freenet appears to scale logarithmically (similar to a binary search-tree), which, if accurate, means that the system could probably deal with a network of millions of nodes without any significant performance drop.

    --

  3. Swarm delivery is the way around bandwidth limits by Jim+McCoy · · Score: 4
    One of the best features of Mojo Nation is that it breaks files up into smaller pieces so that when you want to download a huge file you are not blocked on the limited upstream capacity of the peer at the other end; each agent sends a small chunk of the file, allowing the peer retrieving a file to request multiple pieces in parallel and moving the download speed restriction back to the downstream capacity of the local connection. This sort of distribution system turns a pool of peers into a swarm of ants carrying small pieces of the content. RAID-like error correction protects against peers disappearing and allow for flexible choices about where to go for the pieces to the file. The new 0.920 release of the client starts to demonstrate the advantages this has over conventional peer delivery systems.

    One downside to swarm delivery systems is that data is "published", simple sharing of a common filebase (a la Napster and Gnutella) is not possible. Someone has to upload the pieces to the system in the first place for them to be available because the system does not do the "let me take a look through your hard disk for things to give to others" kind of file sharing found in other P2P systems. jim

  4. True P2P applications have this limitation by jht · · Score: 5

    Gnutella, being a real P2P applicaation, will suffer from scalability problems that a server-based system like Napster can work around. If Napster gets too popular, they can always add fatter pipes and bigger servers. But Gnutella is bandwidth-constrained since there is no central server farm tracking all the users.

    The exchanges in Napster themselves may in fact be peer-to-peer, but we need to remember that they have big honking servers arbitrating the connections.

    Gnutella's design is terrific (and a great hack), but unless they can re-jigger things to knock the slow connections down in priority (or some comparable solution), they're doomed to be a victim of their own success. I guess the other possibility would be for a minimum bandwidth requirement for the software to enforce. Perhaps some enterprising person will write a Gnutella that only allows, say, 144 Kbps and up connections on the network.

    It would be interesting, though cruel, to relegate all the dialup people to second-class citizen status, but it would allow Gnutella to scale a lot past the existing limits.

    - -Josh Turiel

    --
    -- Josh Turiel
    "2. Do not eat iPod Shuffle."
  5. If the network is fragmenting by PD · · Score: 3

    then why is it that I'm always on the same network fragment as that idiot spammer who returns your search request with an html extension but containing a stupid advertizement?

  6. No thank you. by FallLine · · Score: 3

    GNUtella may be an interesting idea, but it's nothing more than a hack. Splitting into subnetworks is both infeasible and undesirable. First, you really can't compare it to IRC. IRC is highly centralized, whereas everything about GNUtella is distributed. IRC can, and does, scale for many thousands of users effectively; GNUtella does not (it responds like crap with any significant number of users) Secondly, you're thinking of the term "network" too rigidly. There is no network admin, no physical location, no centralization. In short, it's a ragtag and volatile collection of different IP addresses. There isn't a way to rigidly enforce the number of users in GNUtella, so how does one keep the networks divided into neat little units. This also means that it's hard to return to a specific network amongst a number of others. Where might your hotlist users be? Where do you find those with like interests when everything is constantly tossing and turning? Finally, and most importantly, you underestimate the importance of size. When the network can only effectively scale to ~5k users (probably a stretch), and when only one in 10 of those users has broadband that can support a decent number of speedy transfers (especially important when users tend to sign off and on while you're downloading), and when only one in 10 of those users has a sizable collection being shared (seems like most users have the same top pop garbage that everyone else has), you're ultimately reduced to, say, 50 users that you'd actually want to search from. I don't know about you, but 50 users isn't nearly enough. Now you might argue that i'm pulling these numbers out of my ass (and you'd be mostly right), but if you look at the empirical results, it's not far off the mark.

    In my opinion, the only thing that something as trivial as GNUtella is good for, ironically, is the IRC types. Who could form psuedo-private loose knit "networks" from which they can share warez/mp3/porn with their "friends" without the need for a dependable server (i.e., just join the channel find an IP and connect to it)

  7. Leverage "location" to make network size irrelevan by cnicolai · · Score: 3
    If you have a room full of all different kinds of people, they'll interact more meaningfully if they can wander the room, moving next to like minded people, than if they're stuck in their randomly assigned chair. We should let the gnutella network self-organize like that. Here are some details:

    Have clients keep some keywords about the user. It could be a user-written paragraph, the names of shared files, recent search requests, etc. Clients would also have a "horizon" H: clients within H hops are considered "local". Clients can query other local clients for their keywords, and determine how similar those keywords are to their own (maybe a percent).

    Define a "crawl" to be dropping one (low-keyword-match) direct connection and forming a new direct connection to a local node. You might decrease search response times by crawling repeatedly toward higher keyword-matching nodes.

    Imagine a "speed" setting, measured in crawls per minute. There could also be a "randomness" setting, to misrepresent percent-keyword-match by a random amount for each local node. These settings could decrease over time, so you gradually lock in to a suitable local community without getting caught at the nearest local maximum. This idea is borrowed from simulated annealing, which someone else here probably understands better than I.

    Is it possible to integrate such clients into the existing network, through search and search-response packets with a ttl of H?

    Your horizon defines a neighborhood of local nodes. Their shared files will likely be of interest to you, so your client software might list them. In addition, their _ideas_ might be of interest, so your client software could show you their keywords, and allow instant messaging. There could even be a local neighborhood chat, ignoring chat packets with (hops > H), and sending packets with (ttl = H).

    Usage scenario: I heard a band on the radio; sounded kinda like some other bands A B and C; and the lyrics had something to do with X, although I don't think they used that word. I make sure to put A B and C in my keywords, push up the speed and randomness sliders, and wait for them to settle down. Then I start asking in the chat if anyone knows about .... Maybe someone helps me out, and puts up a sample mp3. I might even ask if there are other bands like that.

    Current Napster/Gnutella/whatever software lets you find songs you've heard of by bands you've heard of. Gnutella neighborhoods could let you find music you've never heard of.

    So; here's the rub: What's the best way to get people to buy into this? With snow just setting in here in Buffalo I have a lot of coding time; what's the best codebase to start from? Who should I convince? (and of course, what am I missing and how could this idea be made better?)

    Thanks for reading this whole long thang.

    Chris

  8. More evidence of P2P's weakness by Salamander · · Score: 5

    This is a bandwagon that just won't roll very far, and the reason - as usual - is obvious to people who've studied the field for a while. Naively implemented, a P2P protocol tends to generate O(n^2) messages for a given workload, where N is the number of nodes. This can often be brought down to O(n) but only with absolutely top-notch developers and a lot of effort. Better than O(n) is usually impossible.

    By contrast, hierarchical systems tend to hover between O(n) and O(log(n)) depending on the particular problem. This does not necessarily apply only to single-rooted hierarchies, either. A multi-rooted hierarchy tends to exhibit the same scaling behavior, though of course the more roots you have the more you start to look like P2P and share its scaling characteristics.

    The long and the short of it is that P2P just doesn't scale well. Even the best-implemented P2P protocol can merely approach the message efficiency of a naively implemented hierarchical protocol. For large numbers of nodes this results in the P2P implementation simply getting swamped. The only question is how large and how swamped it has to be before it becomes unusable.

    --
    Slashdot - News for Herds. Stuff that Splatters.
    1. Re:More evidence of P2P's weakness by LHOOQtius_ov_Borg · · Score: 3

      A compromise between P2P and overlapping heirarchies is possible using automatic assignment of nodes to (multiple) regions based on tasks (as in our Webworld system)... Thus P2P is used for certain things (node discovery accross the Net without resorting to an Net-wide broadcast, anonymous filesharing a'la Freenet, etc.) and heirarchies for other things (processes which need to have an average messaging time node-to-node to do resource allocation between processing and messaging)

      Efficiency is not the only issue in P2P... anonymity is one, and another is to inject a type of fault-tolerance into a heirarchal system by allowing for more dynamic assignemnt of heirarchal roles where appropriate...

      Also, since many P2P schemes are built on top of TCP/IP, the option to build a dynamic, hybrid system is much easier, since a heirarchal system lies beneath at the addressing and transport level... You can leverage the messaging efficiency of the heiarchy once you've done discovery through pure P2P, and can also overlay anonymous P2P over the heirarchy for things like Freenet style file sharing...

      P2P and Heirarchy both have their strengths and weakenesses, and particularly clever developers can pool strengths without amplifying weakenesses and get some pretty neat systems...

      --
      o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
  9. gnutella on darwin by proclus · · Score: 3

    I've ported gtk-gnutella to darwin. Here is the link.

    Darwin Gnutella

    Regards,

    proclus

  10. just have an option that says your bandwith type by CiXeL · · Score: 4

    all you do is have a option under preferences ala napster that says your bandwidth type ie. 56k/cable/dsl/t1/t3/etc then whenever you connect to another T1 have it make a strong connection between the two of you. Its network matchmaking. To prevent people from specifying a lower bandwidth than you have, have the program limit your download bandwith to that specific speed which should keep people honest while helping to better organize the network.

  11. Slur on Christmas Island by StrawberryFrog · · Score: 4

    > sleazy (gnet2.ath.cx has the exact same TLD as another website whose URL contains the word "goat";

    cx is a country TLD. Why should you call the whole of Christmas Island sleasy because of one goat who lives there?

    --

    My Karma: ran over your Dogma
    StrawberryFrog

  12. Re:Idea: (scalable and distributed) by Axemaster · · Score: 4

    Actually, MojoNation does something very similar to what you propose.. its still a beta product, and it's still growing, but it looks good so far:

    * Automatic mirroring nodes
    Mojonation block-servers remember what blocks seem to be popular (most requested), and if they dont have them, they may go grab a copy to mirror locally.

    Nodes would automatically mirror data from local (fast) mirrors, so that it's more accessible.

    See above. Data that is popular is automatically mirrored. When data is published to the network, dual-redundancy is used to avoid losing the data if some blocks turn up missing. Think RAID. Well, no, not exactly, but it is somewhat redundant.

    ... 56k clients could connect and ask the "net" of super nodes for the queries on content..

    It's called a content tracker, and anyone can run one on Mojonation. There are two central "master publication trackers" (MPT's) that keep lists of all publication servers, and the clients retrieve this list initially from them. There are possible plans to distribute the MPT's as well.

    Content Security

    All of the content posted to the network would have meta-moderation on it; anyone can classify data, and mark it as such.

    There is currently no 'rating system' in mojonation, but it is something being looked at, barring the technical hurdles in doing so.

    Privacy

    If possible, I'd like to see users IP addresses hidden; only have a unique login name/password setup for security; but this may make hackers/spammers hard to track and ban, but hopefully the meta-moderation would filter out most of it.

    I'm not sure if Mojonation is going to go this route eventually, but if ya use TCP/ip, you can be traced eventually anyway. UDP is unreliable.. As for data privacy, Mojonation actually chops a file up into small blocks, then encrypts those blocks, and distributes them randomly. Then it send the description and block locations to the master server. In essence, nobody knows whats in each individual block on their server (if they run a storage server); everything is encrypted. I am breezing past all the details here, feel free to read more about it if you wish.

    Volunteers
    Anybody?


    http://sourceforge.net/projects/mojo nat ion/

    --
    (Shameless plug): ProcessTree - Put your idletime to use.
  13. Reputation Tracking by pjrc · · Score: 5
    I've been toying with an idea for P2P filesharing, which involves a truely decentralized reputation tracking system. The idea is similar to PGP's "web of trust", which you "know" others based on their public key, and people you know give certifications of others by signing their public keys.

    What good is all that... well, a host could make decisions about which queries to route and which to discard based on any information about the reputation of the originator. Hosts would allow faster sends to downloaders with good reputation. Abusive hosts (Spammers, DoS attacks, etc) would ruin their reputation quickly (or keep recreating new keys all with no reputation).

    Reputation in such a system would be very valuable. Somewhat like slashdot karma, it would appeal to many individuals, who would likely go out of their way to gain reputation signatures, perhaps by providing or mirroring lots of high quality files, attaching good meta-data descriptions to files, etc. The client software would need to have ways for everyone to do moderation on files and users... but unlike slashdot, there would be no universal score, only lots of keys/reputation scores, signed by other users. The software could also automatically detect certain behaviors (files available for download, on-line for long times) of other hosts, and issue reputation points. The idea is that a reputation score is to have a way to allocate the available resources (mainly bandwidth), to establish an incentive for users to share files and act in ways that benefit the network, and of course to make the network resiliant to abuse.

    Now, for a system like this to scale, each host will need a LOT of disk space, to store a giant database of keys and signatures on them, and it would ultimately act like a giant cache. Each host would obviously collect the most positive signatures... the initial communication would be similar to boasting, the requester would send several of the best moderation signatures, hoping that the remote host already knows those people who signed and will therefore offer faster transfers, propagate a query farther, etc.

    Maybe this ultimately works out to be the same as digital cash in MojoNation. I believe it is a different idea, in that it's based on an assumption of abundance.... everybody can win. You can get a great reputation without someone else giving up anything. In a cash system, when you get cash (mojo), someone else gives it up, and the overall philosophy is of scarcity.

    If you have any ideas or thoughts to add to this, please post. Am I totally out in left field here, or does this seem like a reasonable idea?

  14. oh, *please* by vsync64 · · Score: 4
    I'm sick of this. "Gnutella's going to collapse! We need new innovation! It doesn't scale!"

    It doesn't have to cover the entire Internet. The fact that you can simply specify a server to contact makes the solution so obvious that I can't believe people are still whining.

    Let Gnutella split into multiple networks. It worked for IRC, it will work here, and it will work for similar problems in the future. Any problem that doesn't lend itself well to subdivision is probably badly specified. Don't forget that the Internet is a network of networks, and it works well for a reason.

    --
    TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
  15. yet another suggestion & question by SubtleNuance · · Score: 3

    Here is my 'proposed solution' - as everyone else has one, I thought I'd toss this idea out. Why not extend the Gnutella protocol to include a method to subdivide the existing network. Meaning - instead of randomly collecting other nodes of any type - why not only connect nodes of a certain type say "Warez" or "MP3Z". Now if I have 1.2.3.4-MP3 and I choose to connect to the "MP3" subnet of gnutella I will.

    Clients Can query the larger 'unsegmented' net to determine the 'subnetted' network extensions:

    5.6.7.8:warez;
    9.0.1.2:pron;
    3.4.5.6:warez;
    ect.

    This could probably be implemented without breaking the existing clients and network where only Gnutella 'v2' clients would be able to choose a subnet to join. When the "MP3Z" network grows to the breakpoint - someone starts a MP3ZZ network.

    As a side note: Has an organization or project formed on any collective level to address these problems? Is there a 'recognized' authority that is guiding the 'official' Gnutella protocol and a reference implementation? Gnutella is a very necessary model to pursue and develop because of the threat to Napster (though OpenNap provides a mechanism to thwart the $RIAA$MPAA$ whores - there is still the problem of having 'servers' to identify and attack (not to mention the problem that Napigator will have when Napster is finally shut down...))

  16. Idea: by kwj8fty1 · · Score: 5

    I've been reading tidbits around the net, and I'd like to ask what people think about this:

    Automatic mirroring nodes

    Nodes would automatically mirror data from local (fast) mirrors, so that it's more accessible. It would need to "learn" what files are requested, and then mirror them. What would stop the script kiddies from "rating" the content they want up, so it would be mirrored more often?

    Structure

    If all of the clients are required to keep a copy of the "whole database", it is not feasible without everyone on the network having a T3+, or later OC3+ connection. But as with the data, the nodes keep track of other nodes, but only if the bandwidth permits. 56k clients could connect and ask the "net" of super nodes for the queries on content. No one node should be in control; but many based on the same rule set. You would have to have a setting on the client for "perm super-node", or just "56k browser". Even the 56k browser could contribute to the network however; two 56k modems that are on the same segment of 'net can transmit with very low latency; they can buffer queries from the super nodes, and allow for faster access.

    Content Security

    All of the content posted to the network would have meta-moderation on it; anyone can classify data, and mark it as such. People can also rate classifcations; so to prevent some spam. If a file with the same name shows up on the 'net, it could end up with the same rating. (my_garage_band_called_nirvana_that_nobody_has_hea rd_of.mp3)

    I'm sure that folks have a complex yet effect methods of rating. (flame wars may ensue) but I'd be really interested in hearing ideas.


    Privacy

    If possible, I'd like to see users IP addresses hidden; only have a unique login name/password setup for security; but this may make hackers/spammers hard to track and ban, but hopefully the meta-moderation would filter out most of it.

    Volunteers

    Anybody?

    -Eric Johanson - ericj.spambad@cubesearch.com

    This sig for rent

  17. technical advances... by tewwetruggur · · Score: 3
    Ok... as I see it, we've just been able to hook up a monkey to a robotic arm, over the internet... now, certainly, there must be some useful p2p implementation of this technology, perticularly if you blend it with the Infinite Monkey Protocol Suite... perhpas this could be the dawning of a "Speedy Monkey Brain Protocol":

    Hundreds of monkeys, eating bananas, swingin' away on ropes, with their brains hooked together with wireless broadband technology, all for the purpose of file sharing!

    And before anyone gets a change to say it...

    ...Imagine a Beowulf cluster of those monkeys...

    There... it's been said.

    Thank-you, and good day!

    --
    Hi! This is the Sig, blatantly attached to the end of this comment.
  18. needs: Better GUI and User Friendliness by nirvana_am_i · · Score: 4

    One of the reasons that Gnutella flounders is the Interface. The creators had a novell idea of creating a node-based network for file sharing, but their interface needs some work.

    Napster, and then Scour, both simplified their application so any nitwit (even some mac users using macster) could gain access to the resources.

    A better interface as well as some way to have the top hosts from gnutellahosts.com automatically be used everytime the application is loaded up is definately a must.

    What the developers need to do is try a rapid prototyping model . As much as I hated it while doing that damn internship, it really does work. People need to be surveyed on how the application should work. The only way to come up with a good product which cathes the broadest audience is to get feedback from that audience.

    Thats enough bs from me.

    --
    If you pointed the gun at someone and found out it was your clone pointing a gun at you, think of what you would think.