Gnutella Not Scaling?
cbull writes "ZDNet Music has an article that makes an argument that "Gnutella is Going Down in Flames". Basically, the argument is that Gnutella isn't as scalable as Napster."
← Back to Stories (view on slashdot.org)
Well, this is just shooting from the hip, but someone should look into writing an improved client for broadband connected users. This client would feature caching of results to and from its immediate connections, and perhaps out to two or 3 nodes distant.
If you've got a big pipe, and you're going to be connected to gnutella for awhile, this would improve the performance of your client and those closest to you.
Of course, if you really want improvement, you'd have to build this capability into the protocol. Allow clients to register as either low or high bandwidth. Then low bandwidth clients could do anything, but traffic could only go through them for a level or two. Ideally, you'd want every client to be able to reach a high-bandwidth node within 3-5 hops. A connected client would then note and rely upon these distribution nodes to do the work. Perhaps even reconnect to distributors directly...
Just a thought. Isn't this the kind of thing that Freenet already does?
Xentax
You shouldn't verb words.
erm. this seems like a problem that is solvable in any number of ways. Replication seems to be easiest. Cache popular content onto fast pipes (provisions for bandwidth limiting are assumed). Encode a forwarding requirement into the protocol -- every file you download, you have to allow someone to grab that file from you. Use multicast and PPV style scheduling (requesters register for a file, letting the server determine when (within a short timeperiod) to multicast it).
I suprised by this being an issue at all. I haven't looked at the gnutella infrastructure, but these are issues that I would have thought tackled during the initial design.
--
The problem is quite obvious and has been around as long as peer-to-peer and server based networks have both existed. Peer-to-peer networks work wonderfully when they're small. Server based networks are much more effiecent and thereby are nearly always used for large networks. Can Gnutella still work? Yes, but it will have be divided into smaller networks... For example: You have separate networks for: Pop MP3s Rock MP3s Country MP3s Rap MP3s Jazz MP3s Movies Warez..err..Shareware Of course, each network should have a critical mass and then divide in half when it reaches that point. Wow, maybe I should get programming...
Freenet is of course an approach to peer to peer file sharing that tries to address these scalability issues. Shame the article doesn't mention it.
The Freenet people haven't figured out how to do distrbuted searches efficiently yet, although they realize that's a problem. They may well crack that problem, but probably not quickly.
Has everyone else noticed that they get a strange sensation of deja-vu whenever reading slashdot. It is rare to find something which is actually news (and new). Perhaps the geeks of the world need to create some more news, to keep slashdot fed and healthy......
*nod* And the search capabilities seem to be remarkably moronic. On a friend's computer, I watched him wade through all sorts of files that weren't even germane to the parameters he'd searched for. In the end, it all comes down to how people describe the files they are sharing over Gnutella.
.mp3 he was looking for... it took him a while, but the thing of it is, some of these files he couldn't find at all on Napster.
On the plus side, he eventually did manage to find every single
Is there any reasonable way to determine usage stats for Gnutella?
Kierthos
Mr. Hu is not a ninja.
Therefore it is as scaleable as you want it to be. It is stuff like this that reminds me of the good-ol-days when one had to bitch and whine about missing features, and wait around until the people developing said features would come out of the woodwork.
There are still people like that in the world today. What a shame! It seems that ZDnet likes to cater to this crowd. So now they are bitching to an entire community, of which they were - by default - invited to participate.
This is an endemic situation with ALL friggin web content.
If you use search engines which don't check the accuracy of the data they scrounge or run your own with Archie/Veronica types of searches or worse, become your own search engine, snooping on everybody's hard drives, you're going to take longer and longer to retrieve indexes to content that is of more and more dubious quality.
The world NEEDS MP3.com types of businesses that rate & index as well as store content.
The world NEEDS engines that can demand micro-payment from the recipient before sending a file.
The world NEEDS micro payment services like X3.com to catch the pennies and send the content producers their due.
And SCREW the RIAA, MPAA and other Luddites and SCREW the culture vultures who rip off the concent creators (artists and writers etc.) and rip off the consumers by over charging simply because they put themselves in everybody's faces.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
There is a way to start resolving this problem, and it is currently in development.
e &folderId=136401&pageId=177268&JServSess ionId=3fe61b505308701b.415222.969643886549
The gPulp project is currently working on all of these issues. Check proposals and ideas at: http://gnutellang.we go.com/go/wego.pages.page?groupId=133015&view=pag
There is also a server oriented gnutella application which aims to start resolving some of these issues in the near term. Features such as:
1) Provide a server for broadband / dedicated network users to provide content with a true server oriented gnutella node. This will be similar to a modified apache for singular installations, or a federated distributed server architecture for routing and caching fun.
2) Remove broadcast push requests (in all future clients)
3) Proxy and cache support for slow users. This will allow beafy servers to take over some of the load which dialup / slower clients experience. This will be somewhat ala freenet, as popular data will propagate through caches in various nodes. Also, this can provide a level of anonymity which is not present.
4) Adaptive servers which configure their network connections for optimal efficiency. Not too busy, not too slow, and with the widest distance topologically from their peers (if linked) and fuzzy / reactive propogation algorithms so that TTL's and routes can be dynamically modified as load increases or other factors require.
There is nothing fundamentally flawed with the gnutella architecture, and it is far from a 'dead' horse'. However, there are significant innefficiencies and complications which are causing problems right now. Rest assured these will be fixed.
The basic problem is that small sites either take a lot of search hits to which they will answer "no find", or their index has to be mirrored elsewhere, which introduces centralization. There's an economy of scale to searching.
So automatic, distributed, redundant, partial centralization is necessary. This is hard. It also has to be reasonably secure against hacking; look at the problems IRC has. It probably needs a reputation service, so people who spam the indexing system lose.
On the other hand, music interest, being a popularity thing, follows a power law; the music most likely to be searched for will be found easily. A simple hack on Gnutella so that it queries servers slowly, in order, starting at the one with the best response time, stopping with the first find, will keep the thing from collapsing until somebody cracks the hard problems. It's not necessary to crack the general distributed search-engine problem to fix this.
Well, actually it's the problem with all server-less architectures. Is you have to have searches you've got to have server. If you want to make it P2P classic -- make the server invisible. One way is to create distributed server. More on this here.
I can't understand why this is news to anyone. Those of us who spend time thinking about these things said it right away when Gnutella was released, and we had discussed and rejected the broadcast model for routing several times before that (see the Freenet development list archives if you don't believe me).
The Math behind it is simple:
- Every user that that adds Cu amount of capacity to the network (on average).
- Every user also adds Tu amount of traffic (also on average). However, because of the broadcast nature that traffic is sent to all users, so with N users, each user generates Tu*N amount of traffic.
This means that the total capacity of the network is:
C = Cu*N
(Capacity per user times the number of users). The total traffic on the other hand is:
T = Tu * N * N = Tu * N^2.
For the network to work C needs to be greater than T, if T C. You simple cannot win using a broadcat model.
On the Freenet-dev list we have a standing rule that two words are indecent and offensive: "centralize" and "broadcast". We think we can pull it off without them, but it makes everything 1000% more difficult, which is the simple answer to why Freenet is developing more slowly then the one hundred million Napster and Gnutella variants outthere. That, and the fact that you are not helping us...
So I don't think Gnutella is going down in flames. Since it is open source, we may take that as a lesson learnt and perhaps rip out the offended non-scalable part and build a better file sharing device that actually works this time.
Gnutella was a good idea; it was just taken the wrong way by the moronic serverops who can't avoid sticking a ruler between their legs. Personally, I'd prefer having separate servers for content (mp3 specific network, DivX specific network, binary specific network, etc.).
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
I've always thought of gnutella as more of a demonstration than a finished product. While it may not be the best implimentation it shows that distributed file sharing can work well with no central server...its an important step...this version of gnutella may have reached its limit...but there will be more...just some thoughts
My Home: Apartment6
In the article they point out that the load could be cut in half by fixing some bad code.
They further mention that proposals for redesigned version have already been made.
link from article
Not only that, it says support and resources for this project are being sought out - it's active, it's open source, what more do we want?
Given the interest in Gnutella, I don't see any problem finding people to fix known bugs.
Rather then seeing this as the death of Gnutella, I saw it more as a positive article pointing out known bugs that are being fixed, and announcing a the planning of a new and even more powerful version.
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
This is the problem with ALL distributed architectures. Its an N^2 problem.
Only if you insist on reaching all the nodes all the time. If you can afford to reach only a subset of the nodes for any given request, then the problem becomes one of proper clustering.
Note that Napster also implements kind of clustering: you see the files of people in your "cluster", not of all Napster users on Earth.
Kaa
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
Another thing came to mind: Metcalf's law. The power over the Internet is equal to 2 to the power of the number of nodes who are actually on the 'net. If you look at the graph of that, it's exponential. I figure in Gnutella's case, it's power would be inversely proportional to the graph. Any comments?
Everytime i've tried gnutella i've managed to find nothing in comparision to napster (even wrapster) i've actually tried just randomly downloading things on gnutella i.e. 60k (goatsex) files and just get timed out. I've heard it was much more usable in the summer however. The only upside of the current version of gnutella is that its highly entertaining watching the stream of searches coming in :)
Its been mentioned before but some ways of fixing the situation may include doing things like making the searches bandwidth related to filter out the modems. Perhaps a better idea would be to have an auto peer mode where high bandwidth connections become servers for a cluster of machines near them. (Gaining mojo points to take the mojo example for instance) Then clients can just search the (relatively) finite connection of high bandwidth high speed servers much like in the form of napster but the client/server analogy is a bit more fluid..
It uses centrialized content tracking servers, but anyone can run one by just clicking a switch in their client. The content trackers store XML metadata describing the file, so you can search on different fields in different file type categories (easily defineable).
The the files themselves are broken into small redundant pieces and spread over the network. You only need half of the available pieces to reconstruct the original file. This way the system is resistant to servers disappearing. It also means you distribute your load over many hosts and clients with slower connections can still provide block services.
The coolest thing is that Mojo Nation has a built in digital cash called "Mojo" and a microcredit system that effectively turns it into a barter system for disk space, bandwidth, and CPU. Whenever you upload, download, search, or otherwise consume another systems resources, you must compensate them with Mojo. The Mojo represents the disk space, CPU, and bandwidth you are using. You can get Mojo by contributing your resources to the network through the client software (it's automagic). This way nobody can consume more resources than they are contributing to the system. Each person that uses it helps to make it stronger. Of course, being a real digital cash system, nothing stops people from sending Mojo to eachother in e-mail and settling the transaction with something like PayPal.
It's really cool, check it out.
Burris
Some of these problems could be easily solved.
I think there needs to be a way to tell what the network load on an individual node is, and attempt to negotiate connections with machines of similar connection speeds or ping times up to a maximum load cut-off.
Of course, there will still be people with hacked clients that report a bandwidth of 0 and a load of 10, but suspiciously have low pings. Those leeches should be killed, or at least swamped with connections...
Also, it would be nice if the network could re-organize over time, as in, promote people in your segment who give you back successful searches, and cut off branches that don't yield search results. Then everyone who wants free books would eventually find each other, and be separate from everyone who wants free porn (the other 99%, it seems)
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.