Gnutella Not Scaling?
cbull writes "ZDNet Music has an article that makes an argument that "Gnutella is Going Down in Flames". Basically, the argument is that Gnutella isn't as scalable as Napster."
← Back to Stories (view on slashdot.org)
--
The problem is quite obvious and has been around as long as peer-to-peer and server based networks have both existed. Peer-to-peer networks work wonderfully when they're small. Server based networks are much more effiecent and thereby are nearly always used for large networks. Can Gnutella still work? Yes, but it will have be divided into smaller networks... For example: You have separate networks for: Pop MP3s Rock MP3s Country MP3s Rap MP3s Jazz MP3s Movies Warez..err..Shareware Of course, each network should have a critical mass and then divide in half when it reaches that point. Wow, maybe I should get programming...
There is a way to start resolving this problem, and it is currently in development.
e &folderId=136401&pageId=177268&JServSess ionId=3fe61b505308701b.415222.969643886549
The gPulp project is currently working on all of these issues. Check proposals and ideas at: http://gnutellang.we go.com/go/wego.pages.page?groupId=133015&view=pag
There is also a server oriented gnutella application which aims to start resolving some of these issues in the near term. Features such as:
1) Provide a server for broadband / dedicated network users to provide content with a true server oriented gnutella node. This will be similar to a modified apache for singular installations, or a federated distributed server architecture for routing and caching fun.
2) Remove broadcast push requests (in all future clients)
3) Proxy and cache support for slow users. This will allow beafy servers to take over some of the load which dialup / slower clients experience. This will be somewhat ala freenet, as popular data will propagate through caches in various nodes. Also, this can provide a level of anonymity which is not present.
4) Adaptive servers which configure their network connections for optimal efficiency. Not too busy, not too slow, and with the widest distance topologically from their peers (if linked) and fuzzy / reactive propogation algorithms so that TTL's and routes can be dynamically modified as load increases or other factors require.
There is nothing fundamentally flawed with the gnutella architecture, and it is far from a 'dead' horse'. However, there are significant innefficiencies and complications which are causing problems right now. Rest assured these will be fixed.
The basic problem is that small sites either take a lot of search hits to which they will answer "no find", or their index has to be mirrored elsewhere, which introduces centralization. There's an economy of scale to searching.
So automatic, distributed, redundant, partial centralization is necessary. This is hard. It also has to be reasonably secure against hacking; look at the problems IRC has. It probably needs a reputation service, so people who spam the indexing system lose.
On the other hand, music interest, being a popularity thing, follows a power law; the music most likely to be searched for will be found easily. A simple hack on Gnutella so that it queries servers slowly, in order, starting at the one with the best response time, stopping with the first find, will keep the thing from collapsing until somebody cracks the hard problems. It's not necessary to crack the general distributed search-engine problem to fix this.
I can't understand why this is news to anyone. Those of us who spend time thinking about these things said it right away when Gnutella was released, and we had discussed and rejected the broadcast model for routing several times before that (see the Freenet development list archives if you don't believe me).
The Math behind it is simple:
- Every user that that adds Cu amount of capacity to the network (on average).
- Every user also adds Tu amount of traffic (also on average). However, because of the broadcast nature that traffic is sent to all users, so with N users, each user generates Tu*N amount of traffic.
This means that the total capacity of the network is:
C = Cu*N
(Capacity per user times the number of users). The total traffic on the other hand is:
T = Tu * N * N = Tu * N^2.
For the network to work C needs to be greater than T, if T C. You simple cannot win using a broadcat model.
On the Freenet-dev list we have a standing rule that two words are indecent and offensive: "centralize" and "broadcast". We think we can pull it off without them, but it makes everything 1000% more difficult, which is the simple answer to why Freenet is developing more slowly then the one hundred million Napster and Gnutella variants outthere. That, and the fact that you are not helping us...
The underlying Freenet architecture should actually be quite a good fuzzy-searching system, it is just that we have not got around to enabling that functionality yet as we have been concentrating on getting the underlying architecture right.
--
So I don't think Gnutella is going down in flames. Since it is open source, we may take that as a lesson learnt and perhaps rip out the offended non-scalable part and build a better file sharing device that actually works this time.
I've always thought of gnutella as more of a demonstration than a finished product. While it may not be the best implimentation it shows that distributed file sharing can work well with no central server...its an important step...this version of gnutella may have reached its limit...but there will be more...just some thoughts
My Home: Apartment6
In the article they point out that the load could be cut in half by fixing some bad code.
They further mention that proposals for redesigned version have already been made.
link from article
Not only that, it says support and resources for this project are being sought out - it's active, it's open source, what more do we want?
Given the interest in Gnutella, I don't see any problem finding people to fix known bugs.
Rather then seeing this as the death of Gnutella, I saw it more as a positive article pointing out known bugs that are being fixed, and announcing a the planning of a new and even more powerful version.
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
This is the problem with ALL distributed architectures. Its an N^2 problem.
Only if you insist on reaching all the nodes all the time. If you can afford to reach only a subset of the nodes for any given request, then the problem becomes one of proper clustering.
Note that Napster also implements kind of clustering: you see the files of people in your "cluster", not of all Napster users on Earth.
Kaa
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
It uses centrialized content tracking servers, but anyone can run one by just clicking a switch in their client. The content trackers store XML metadata describing the file, so you can search on different fields in different file type categories (easily defineable).
The the files themselves are broken into small redundant pieces and spread over the network. You only need half of the available pieces to reconstruct the original file. This way the system is resistant to servers disappearing. It also means you distribute your load over many hosts and clients with slower connections can still provide block services.
The coolest thing is that Mojo Nation has a built in digital cash called "Mojo" and a microcredit system that effectively turns it into a barter system for disk space, bandwidth, and CPU. Whenever you upload, download, search, or otherwise consume another systems resources, you must compensate them with Mojo. The Mojo represents the disk space, CPU, and bandwidth you are using. You can get Mojo by contributing your resources to the network through the client software (it's automagic). This way nobody can consume more resources than they are contributing to the system. Each person that uses it helps to make it stronger. Of course, being a real digital cash system, nothing stops people from sending Mojo to eachother in e-mail and settling the transaction with something like PayPal.
It's really cool, check it out.
Burris
Some of these problems could be easily solved.
I think there needs to be a way to tell what the network load on an individual node is, and attempt to negotiate connections with machines of similar connection speeds or ping times up to a maximum load cut-off.
Of course, there will still be people with hacked clients that report a bandwidth of 0 and a load of 10, but suspiciously have low pings. Those leeches should be killed, or at least swamped with connections...
Also, it would be nice if the network could re-organize over time, as in, promote people in your segment who give you back successful searches, and cut off branches that don't yield search results. Then everyone who wants free books would eventually find each other, and be separate from everyone who wants free porn (the other 99%, it seems)
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.