Mathematical Analysis of Gnutella
jrp2 sent in a paper written by one of Napster's founding engineers. It is
a mathematical evaluation of Gnutella discussing
why the network won't be able to scale up to any reasonable size. I
have been impressed with Gnutella in the past, and have wondered along
these same lines in the past.
i believe this was posted on slashdot a loooong time ago
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
I seem to remember it was posted ages ago...
In any case, due to the use of proxies, the problem is not as bad.
It's just a BloJJ
Sheesh, this paper is years old. wtf?
Napster: Sucks ass.
Gnutella: Doesn't scale.
(Mod my ass as Flamebait for this, but didn't everyone know about Gnutella's scaling problems, and for-pay Napster sucking ass, based on Slashdot stories months and weeks before today?)
haven't we been here before?
It is a mathamatical evaluation of Gnutella
:)
Someone has not passed his grammatical evaluations at school
-
Roses are #FF0000, Violets are #0000FF, find / -name '*base*' |xargs chown -R us && mv zig greatjustice
Birdhouse in your soul is the best... Particle Man is OK, but just gets too much credit.
I mean, I know that none of us - including our fine moderators - are perfect, but are they at least paying attention?
OK,
- B
http://www.bradheintz.com/
- updated
Will this guy just quit with the numbers and tell me if I'll have the worlds collection of porn downloaded by lunchtime tomorrow? We need to know these things goddammit!.
Invoicing, Time Tracking, Reporting
I think that this could be a *little* off topic.
and on the same day someone from Napster says not Pay Gnutella won't scale
.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
I just started to download their .ogg files yesterday
This is way old... I remember reading this "study" like a year ago.
Heh...somehow I read the title as "Mathematical Analysis of Guatemala", since this article has been posted before and Slashdot never posts anything twice.
This space intentionally left blank.
The problem is not that difficult, if you want Gnutella to scale, then you need to avoid the exponential explosition of the number of messages exchanged between the clients as their number grows. The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way (this is an outrageous simplification of course, but gives an idea). There are many such expriments with Gnutella two with those ideas, this is what BearShare is trying to do.
Look at ICQ. It was fairly decent as an instant messaging client until the numbers hit one million or so and then it needed to control everything under the sun and companies could spam through it. File sharing happens through it all the time too.
I don't care if Gnutella cannot scale to the levels that Napster saw. Smaller is better in my opinion!
Click here or here.
I don't see what the big deal is with Birdhouse in your Soul myself
1) Stale, as many have already pointed out.
2) Irrelevant. Who gives a shit if I can see 'everyone in the world', as long as I can see _enough_ of the world to get done what I want to get done? The way things currently stand, a big chunk of the Gnutella network is 'beyond the TTL horizon' for any given user -- does this actually impede anyone from getting the files they want? It doesn't me, that's for sure...
outdated
Yes, but..
It's sort of like calculating the maximum hull speed for steam ships crossing the Atlantic Ocean and saying there is a theoretical maximum speed to intercontinental travel. Then someone comes along and invents airplanes.
Gnutella will mutate and evolve, and will at somepoint in the future be replaced by something better when it starts to fall over.
The demand for Ms. Spears and the Backstreet Boys is just too damn strong for things to stand still.
I enjoyed that this post was next to the announcement that of the new-and-not-so-improved preview of Napster was out..
~.~
I'm a peripheral visionary.
Obviously all these mp3s I have on my harddrive and listen to every day must have got onto my harddrive using some other file sharing program. Maybe I actually purchased all this music but supressed the memories for fear that I was supporting the music industry. This could explain why I'm so broke! The fact that I type in the name of any song that happens to cross my mind during the long fits of programming (usually accompanied by everything from rap music to beethoven) and it inevitably gives me a list of results in the hundreds is proof enough that the network "scales". When I look at my network stats and see that the small number of files I am sharing (about 150) each have hits in the hundreds of thousands, even though I restart my server at least twice a week, shows that I'm definitely contributing to the network. Surely this article is just a case of sour grapes.
How we know is more important than what we know.
On the topic of this program, a more current story running on msnbc.com right now is telling how it is becoming a severe security risk for users of the program. Here is the article.
What I find most interesting are the kinds of projects that have sprung up in Gnutella's wake. Many of these started out as attempts to improve Gnutella, and have since moved on (the Gnutella Next Generation working group never really materialized into anything)
We had napster and one extreme, gnutella at the other, and in the middle a re a number of partially centralized systems with super peers like Fast Track, such as:
Open FT
JXTA Search
GNet
NEShare
and many others...
Then there are the alternative projects that use an entirely different mechanism. For example, social discovery as implemented in:
NeuroGrid
ALPINE
Or distributed keyword hash indexes like:
Chord
Circle
GISP
JXTA Distributed Indexing
And many others as well.
The coming year(s) will see a lot of maturity in these areas, and searching large peer networks will become ever more efficient over time. Gnutella showed us the possibilities of a fully decentralized model, and refinements of its underlying architecture can produce vastly better solutions.
2002 will be an interesting year for peer networking applications...
yea this is a repeat, but just wait till it's on "Ask Slashdot" next week.
"Given a concurrent demographic comparable to Napster (assuming equally balanced), searching for a simple 18 byte string "grateful dead live" unleashes 90 megabytes worth of data to be transmitted." 90MB!
come ON, any song which references jason and the argonauts and doesn't sound utterly stupid is by definition a great song, and in my opiniong tmbg's best song.
-rp
this comment - it was ancient news then!
sulli
RTFJ.
Maybe you should be trying the Google cache!
314-15-9265
"From the charts above, it becomes mind-numbingly clear that the Gnutella distributed architecture is fundamentally flawed and can have a horrific impact on any network. " Hmm... I figured that out after watching gnutella for a few min. with a bandwidth monitor.
Actually there is an ask slashdot article between these. Research before you open your mouth. ;)
Since numerous people above have pointed out this is a repeat, everyone should browse the older article and repost all the comments that were modded up to +5, and reap the benefits when that karma comes rollin' in! ;)
Last night I shot an elephant in my pajamas. How he got in my pajamas I'll never know.
Anyone who understands how Gnutella works (unfortunately, too few people) knows that Gnutella is horribly broken, will never work, and is basically unfixable.
The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".
However, the REAL question is whether you can have a peer-to-peer network with decentralized servers, i.e., with clients that automatically establish a heirarchy among all the clients, and certain clients become more "server like". They only way to make a Gnutella work is by making it heirarchical, but the heirarchy needs to be automatic for it have the same general "virtual network" aspect of Gnutella.
Is it possible? I don't know. You would probably have to have automatic bandwidth measurements, depth probes, all kinds of things to make it work. I simply don't know if it would be possible to automate something like that.
Wow this troll is as about common as the BSD is dead one
HINT - if you want people to belive you you should atleast change the story to something more realistic
The idiot that mod'ed this Offtopic should get a good thump in the nuts. This news is so old and crusty it could pass for Cowboy Neal's underwear.
int main(void) {
int i = 1;
while(i > 0) {
went(pete,repeat)->store;
out << pete;
i++;
}
return 0;
}
Wow, that's the biggest load of steaming bullshit I've come across in quite some time.
Tonight on Dateline: Dangerous computer hacker terrorists steal your files by... um, you sharing them on the Gnutella network like a fucking moron...
Did you use Linux as the server? There are known limitations to Linux, the main ones being:
- it sucks
- it sucks
- it sucks
Hope that this clears things up.Maybe next time you should look for material that's a bit more popular among servers and draws a slightly more highbrow collector. In other words, lay off the barnyard sex movies. :)
The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished...
This is exactly the point. This is the only way to properly distribute querys, as anyone who has set up a multi-homed ISP knows. It works on the same principle as BGP routing, i.e. there are routers (super-nodes, or whatever) that have a specific number (an ASN - or in P2P, the supernode address) but there are thousands of computers (casual modem users - p2p) on the internet that these routers have information about. If BGP routing worked this way, nothing would go anywhere. However, by having several nodes giving out information on who has what and how to get it, while the majority of users just download and give out their own info, not pass along info of others, things work much smoother. And with a correct implementation, everyone could have a route to everyone's file list at a minimal bandwidth useage.
sig?
This is one of the biggest problems with P2P file sharing programs. Nearly everyone wants great content for free, but very few are willing to give back and supply any of it.
puzzle me this...
how is the FIRST post to point out that the article is an ancient repeat mod'ed Offtopic and Redundant while the rest posted much later are (Score 5: Informative)???
Yep it sure was, quite a while ago, and at the time it was first published and acknowledged by the Gnutella crowd, work began in earnest to resolve the issue.
That work resulted in research like this, and to major changes in Gnutella implementations.
I survived the Dick Cheney Presidency 7 to 9 AM 7-21-07
There is a major flaw in all P2P software, and it has nothing to do with the coding. More people tend to want to take than recieve. I remember seeing a line graph on LimeWire's page (I think?) that showed a monthly progression of the number of people sharing files compared to the number of people downloading files. The 'downloaders' were outweighing the 'uploaders' by a HUGE ammount.
If everyone was willing to share their files, then there would be no such problem with P2P programs.
Duh. OpenFT does though. giFT needs help testing thier network.
have been impressed with Gnutella in the past, and have wondered along these same lines in the past.
I think we could add:
"... but since I was too busy doodling and writing dirty, hackish perl when I was in school, I'm glad someone else did the actual math."
Intercarve Networks, LLC
Am I missing something or is gnutella a big risk to internet stability/security?
In theory, a true Peer-to-Peer file transfer network would exist in a decentralized fashion where you would never have to query a central host for routing or file availability. Napster requires you to route through one of the Napster servers for information. Even introducing Napigator still doesn't alter the Napster model because all it does is allow you to route through a different central host. It seems that all Napster did was integrate a search engine and nameserving into one element (coming from only one provider).
This isn't to knock the accomplishments of Napster, it was certainly an original idea to incorporate these areas and provide a GUI access client to boot. But it is apparent that Napster developers weren't all that revolutionary in their thinking either.
The suggestion of true P2P is revolutionary, and the perfect implementation (should it ever arrive) will also be revolutionary. But the Napster model is no different than everyone providing their MP3 list to a website who maintains a list of links on where to download MP3s. Napster simply automated this process. Napster is no more P2P than any TCP/IP connection not operated through a proxy.
Is http P2P? I'm talking directly to another system, and there is no moderator/mediator. Normally, I have to find out about that system from a 3rd party (e.g. a search engine) -- just like someone obtains a list of links from Napster.
True, I'm being no better than the author of the original article; because I too am offering no solutions. I'm just holding out hope for true P2P in the future.
And what would you suggest as an alternative? There's always a Microsoft product, if you don't mind rebooting every ten minutes and having to perform countless daily updates to prevent crackers from taking down your system...at least Linux is stable and relatively hack-proof.
In order to be immortal you must be organize
If it's such an old news, how come no one on
I didn't get a chance to read the whole paper, as it stalled half way thru (usual
There is more than one way to do broadcasts.
Certain forms of randomized routing offer a
guarrantee of a broadcast to complete after
O(n) time for n clients in O(1) time with high
probability. And there are trade-offs in
between.
I suggest you all study algorithms for
broadcast before assuming that the most
bandwidth intensive algorithm will be the
only one used.
I'm not very familiar with the deep technical details of Gnutella, but isn't there a limit on how far the "horizon" is (i.e.:how many users near by you can see)? If this is correct, all the mathematics here presented apply only in theory and not in practice, as what will happen is that (1) most queries will not be relayed past a "reasonable horizon", and (2) there exists a good (or high?) probability that as long as you're searching for "popular" files, that you will eventually find them.
Because of this basic and simple observation, I do not foresee gnutella to die anytime soon because of scalability reasons alone (however copy-protection issues are another story).
Again let me stress that my observation here is based on the strong assumption that the "search horizon" is "reasonable sized" so as not to have to search the whole gnutella network.
Gnutella has always been usable for me. Even after the Napster collapse. But its kind of like stamp collecting, you have to put some work into finding the item you want. There is no guarantee that the song you want will be there today, but it might be there tomorrow, or an hour from now- who knows?
If you read through this research paper it'll start with N=4 and T=5. As you continue to read through the paper he quotes bandwidth figures from his table using various other N and T values.
For example, in the very last table (Bandwidth rates for 10qps) he says the bandwidth generated will by 8GB/s, which align with N=8, T=7. Where you to use the N and T values from the beginning, this would be 2.4MB/s, which is off by 3143 and one third times.
Going back to Joe User's Greatful Dead query, it only generates ~250KB, not 800MB.
Remember, very very few people are going to modify their TTL or open connections. This ``white paper'' grossly misstates the amount of bandwidth Gnutella generates and seems to be an anti-Gnutalla paper designed to mislead rather than an honest and fair judgment
marotti.com
Maybe Slashdot should diversify, since
it can't find enough new things to talk about.
As I pointed out last time this was posted, this article is basically 100% FUD. Yes, the amount of traffic goes up. And no, gnutella doesn't scale very well. But the author goes out of his way to make the problem look worse than it actually is. You see, the article only computes the total amount of traffic in the entire network. A number which is both huge and meaningless. You see, by this math, if I send a packet somewhere and it takes 10 hops, well, thats like sending 10 packets!
At the end of the paper, the author coughs up the big scary number of 63GBps of traffic in the Gnutella network when the nodes each have 8 connections and are using a TTL of 8. Wow! That's a lot of traffic. That certainly isn't scaling! Well, what the author never points out is that, by his own math, the network has 7,686,400 users at this point! When we divide up the total traffic among all of those network links, we get a different view. If you do the math you discover that this is a whopping 72Kbps! Oh no! It's the end of the world! Well, no, it's not. True, it's more than a modem can handle. But it's well within the reach of most cable modem connections. Given that your computer is being expected to handle the search requests of over 7 million other people, it's not that much traffic.
Don't get me wrong, I agree that Gnutella doesn't scale all that well. But this paper is just plain FUD. The only number that really matters to users is the total bandwidth load on their pipe. By carefully avoiding that number, which isn't very big and scary at all, the auther is clearly lying by ommision. Given all of the real problems networks like Gnutella encounter, it isn't interesting to read this sort of drivel. Why don't we drag out Mathmatical and model how much bandwidth Napster wastes by transmitting the names of all the files being shared even though most of them will never get searched for. Hmmm. lets assume 7,000,000 users. Let's assume that they each share 1000 files with an average filename length of 32 characters. Why, that's 224 Gigabytes of data, and we haven't even done any searches yet! Cleary, Napster doesn't scale. Ugh. This guy might know how to use Mathematica, but I still suspect he worked in the Marketing department. With the same guys who will tell you about their 200Mbps fast ethernet.
its important to know that the author of this paper is Jordan Ritter, who is the co-founder of Napster.
-- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
hey dickhead, there's no ask slashdot bullshit on MY page. and i've got it set to show everything. so how about you kiss my ass? make sure you lick it clean, because i usually forget to wipe.
Quality of files is the problem.
There are a number of problems using Gnutella.
Getting a complete and/or undamaged file is difficult. (Especially anything long.)
Just because you find a file does not meanyou can get it. Huge numbers of the files "available" on Gnutella are either on non-routable addresses or on servers that refuse connection or timeout.
Many of the files on Gnutella are misnamed or misattributed. Do a search on "Weird Al", for example. You will get all sorts of responses, few of them actually by Weird Al.
It is useless for getting files that contain multiple parts, unless tared or the like. (For example, getting a complete album is next to impossible. The unreliability of the service ensures that.)
Gnutella seems to have nothing to insure any sort of "quality of service" or file intergrity.
Pretty much a waste of time.
"Trademarks are the heraldry of the new feudalism."
personal attacks hurt, especially when deserved
Sorry, but when Gnutella first came out, and I looked at the protocol, I thought to myself, "Gee, this is nice, but when that graph of connections starts getting highly connected and you have all those people spitting out queries and forwarding others there is going to be a humongous sucking sound as the bandwidth is taken." No, I didn't read a paper or do the math, but anyone with a basic grounding in graph theory and computer science would see the shortcomings immediately. Yeah, it will evolve and should since I like this kinda stuff... but it wasn't exactly rocket science. :-/
Humorless sig goes here.
I recommend everyone to check out KaZaA. It's scalable, has hundreds of thousands of users online on average, with about half a terabyte of files available in total. It has not only mp3, but also divx and mpeg movies (film and pr0n), software and all kinds of documents (books). It's a kind of Napster++. The client works very well, there is also a Linux version in beta available.
:)
Other cool features:
It automatically resumes interrupted downloads. You could even shut down your computer for days and after starting up, KaZaA will just resume the downloads (when still available, of course).
Multi source downloading. Every file is checksummed and the program will look for other sources. A download will automatically be spread over several sources to speed up the download process.
All these things make KaZaA imho a Napster killer (as far as Napster can still be killed, but still
cryptic
personal attacks hurt, especially when deserved
First, if I understand what he's driving at correctly, the bandwidth numbers he gives are for the Gnotella network as a whole, not for each and every client connected to it. This is equivelent to saying "average 'HTTP' usage generates n amount of bandwidth over the Internet", or "DNS traffic will consume x number of bytes on a given network". So what? Would anyone be really shocked if 7,000,000 web browsers generated HTTP and DNS traffic in the gigabyte range? Doesn't bother me. That might be an interesting number to your ISP but as a user of Gnotella I could care less about how much total bandwidth my query for 'The Grateful Dead' takes up. It sure sounds like alot of traffic, but it's distributed over the entire Gnotella network. As long as the traffic isn't high enough to overwhelm individual clients I don't see the problem. These numbers just don't seem to be that important, or am I missing something here?
The other item the author fails to consider (and I'm going to guess that, as one of the engineers behind Napster, he probably knows better) are client-side optimizations like search caching and differentiation of the clients. The caching arguement goes like this:
If client A sends out a query to client C looking for 'Grateful Dead' and client B sends out a very similar request to client C , say, 'The Grateful Dead', even basic caching would prevent client C from sending this request back out to the same hosts that responded to the first request made by client A. Again, am I missing something important here? I'm not sure that caching would reduce the traffic dramatically but I'd be willing to bet that it would improve matters significantly, especially for clients that remained 'up' for long periods of time (which is in itself another important factor that seems to be missing here). This just seems so obvious.
There are bunches of optimizations like this that can be done with the Gnotella application to reduce the overall bandwidth. And this leads to the other half of my point, i.e. the author assumes that each and every client will be functionally the same. They aren't. The Gnotella FAQ tells you to reduce your N if your on a slow connection. This means that not all Gnotella clients are exactly the same now anyway; some have higher N's than others. The FastTrack guys (i.e. KaZaA, Morpheous, et. al.) have already shown that it makes sence from an efficency standpoint to have some clients do more then others via 'supernodes' and the like. This seems like a fairly obvious development on the client side and I can't for the life of me understand why this isn't addressed. I mean, really, isn't the 'client-client' vs. 'client-server' approach really the underlying assumption behind why Napster will scale and Gnotella won't?
I hate to say it but it looks to me like the author is showing just a little bias here. Hey, I suppose that if I worked on a competing standard I'd trash-talk the competition too but I think his time would be better spent making the Napster approach work better. No matter how you slice it or dice it Napster is pretty much dead while the Gnotella network is still alive and kicking. Maybe it will never scale to 'billions and billions' of hosts but at least it's still around and going strong.
Not because of any difference in the clients (they're virtually identical and are both on the Fasttrack network) but because it doesn't contain the spyware that Kazaa does.
---
I didn't want to leave this space blank.
Gnutella is a network HOG! Of course I was a idiot who decided to install LImeWire supernode on my PC and ran it for about 3 days before getting tired of it. The next week I am informed that my Internet is being discontinued until Feburary 4 (I live on campus). When I get around the BS from network services I find out that I was using 80% of my subnet and 10% of the outbound traffic from Gnutella. Of course, they see all the porn that is trafficed in Gnutella and assume it is a porn service. After explaining to them what Gnutella was they basically said well you weren't a porn server but still you are in trouble.
/. while my PC in my room coolects dust.
SO I am at the LIbrary typing on
There is nothing wrong with being gay. It's getting caught where the trouble lies.
Caching data is a good solution for Gnutella but note that it is only good if you use a client that does caching and note that Internet users generally don't like sharing their own resources (I mean their bandwidth) with the neighbours.
How would a P2P with the scaling the likes of which IRC networks use?
Since I believe IRC scales pretty good why not build the Gnutella network like that?
Earth: Mostly harmless.
Napster: Sucks ass.
Gnutella: Doesn't scale.
cpeterso
It's quite simple. Someone might be offering a file, and 50 people are all downloading that file. If you download BearShare (a Gnutella client), it even has a little check box that says something like 'Share Files'. The fact is that more people are downloading than uploading.
There were several responces to this article pointing out that the current Gnutella network is much more scalable than the one discussed in the article. Try looking here and here for articles discussing the changes since early 2000.
Come on Slashdot, its 2002 not 2000. It looks pretty bad accepting this article right after the Napster one. Does Slashdot or VA own a stake in Napster or something?
There is a new Gnutella standard extension called HUGE that will will fix a number of this file integrity and reliability problems. I think Bearshare is very close to releasing an implementation.
There is also a sister specification to HUGE entitled the "Content-Addressable Web" which is for performing distributed downloads of content from normal web sites, and is thus not Gnutella specific. The CAW specification is available at http://onionnetworks.com/caw/
--
Justin Chapweske, Onion Networks
http://onionnetworks.com/
Here's a really wacked-out thought I had that I've been working on.
Gnutella clients can sometimes have more "potential" connections out to the network than MAX_CONNECT (because they open, say five, expecting two and get four). If so, why not do a traceroute to each of the hosts and crop out the one that is the most hops away? Iterate cropping until there are MAX_CONNECT active connections.
This would tend to favor a network that closely reflected the underlying structure of the network - thus reducing any earth-shattering impact on the inet backbone?
To further force a short-inet-distance perhaps clients should (optionally) not accept connections from far-flung hosts?
Additionally, clients should count already-seen packets (which they are supposed to drop) against the goodness of a given link - thus reducing routing loops in the network and forcing it to flatten out instead of clump together.
These might allow clients to have a higher TTL without increasing net net (har har) bandwidth - less duplicated, circularly-routed, lengthy-path, etc, data.
I suspect (have not checked) that some clients do the latter (routing loop prevention), but I know of none doing the formers.
I will get around to coding this soon, unless somebody can tell me it's a stupid idea (for a good reason).
--Nathan
Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
From a financial perspective, analysts believe unlimited bandwidth is coming because it typically costs the phone company about 40 bucks to service an account. For a typical low end cellular account ($20/month), the company looses money. The same rules apply to cable, dsl and normal phone line. If the progress of broadband comes to a halt and reverses, all of that math means absolutely nothing.
Who care is Napster scales better when no one has broadband access. Really people, P2P technology is heavily dependent on broadband access to a large percentage of the population. If broadband gets priced above the average Joe's budget, do you think it will really matter? If only the top 5% of the population have broadband, the numbers cited in the article are meaningless.
Time to look beyond the statistical FUD and look at the real issue at hand. The architecture of gnutella will only be a factor if unlimited bandwidth comes to the masses.
and I hate to say this, but take an idea from the windows networking world... each machine has an election to see who is going to be the master browser (based on average connected and up times.. the clients that are up and serving the longest and with the shortest down times historically) then we have the next few building the same master browser database but sitting dormant (just listening and cacheing) until the master browser disappears, then the next highest pipes up and says "ohhh lookie me!" thus keeping a master server up (and that master server could load balance with the sub servers by just sending a "busy use 127.0.0.2 or 127.0.0.3" back to the client.
it could be fixed, and made powerful, self scaling.
Do not look at laser with remaining good eye.
this message is composed of bits.
zen.
LimeWire currently implements a variation of this -- what we call "UltraPeers." UltraPeers establish a significantly greater horizon on the network, and there are other distributed protocols that do this in other creative ways, such as Chord out of MIT, which can be found at: http://www.pdos.lcs.mit.edu/chord/ That aside, there is significant evidence to show that a distributed network can scale far better than any centralized network. Remember that Napster had serious scaling problems as well -- you could only see the files from the hosts on whichever server you happened to be logged in to. The only solution to that problem is the brain-dead purchase of a yet faster multi-million dollar server. I would not call that scaling. As everyone else has pointed out, this discussion began and ended in the Gnutella community about a year ago.
Adam Fisk
If what he says is true, that you could generate 14 megs' worth of responses, what's to stop me from forging my IP address to be YOUR IP address, querying for the string mp3, and sitting back and watching the carnage? There would be almost no way to trace this, and it would certainly generate a significant amount of traffic, so what's to stop me? Maybe his statistics are a bit inaccurate, but all the same, you could cause a lot of data to be sent somewhere, while not causing yourself any significant lag at all.
Synergy is your friend
HEY! you forgot My program: Myster... :-)
It would probably fall under the catagory of "social discovery" but I'm working on a proxying system that would allow many nodes to pool their resources under one node, effectively implementing a kind of super node.
In a remarkable display of memory, CmdrTaco was able to remember not to double post the Gnutella scaling story for a full 8 months! Here's what his trainers had to say:
"We're really proud of our little tacito, this is quite an accomplishment. We still need to work with him on his timeliness, but we'll take what progress we can get!"
Critics are quick to point out that we don't know when in those 8 months CmdrTaco forgot about the original posting, since there's no telling if the Slashdot readers submitted the story during that time.
In other news...
The article is about scaling. Gnutella doesn't scale in the sense that your querries are only send to 4-8 thousand users.
If Gnutella tried to scale it would not be able to because of the figures he shows in the paper. It simply can't scale.
Something you forgot to mention in defending Gnutella is that the article was written a long time ago before clients started caching responses.
Now you get more responses but over half of them are wrong.
7,000,000 users??? That estimate is about 200 times the actual value. More like 40,000 users, if that.
Seriously... Gnutella is not worth defending as a viable p2p. Something like Kazaa is much better because it allows you to search more computers with less bandwidth waste.
Gnutella was cool because it was the original and because it showed that distributed file sharing networks were possible. It got around the legal restrictions where a large Napster-like server could be shut down. It got around the technical problems where schools were blocking file sharing ports. It has historical value but it no longer has any value as a file sharing protocol.
Why don't we just leave all this in the past?
sometime in a future we'll all realize that Ms. Spears in fact was the major driving force behind the major Internet-related multimedia innovations. Including the recent IETF proposal on multimedia chat attachments. (not that I'm going to start listening to her music, of course)
win2000 has an uptime that makes a viagra-popping, 3-balled billygoat look bad.
I developed GNUTELLA in tandem. But I designed it mainly to have a theortical online RPG imagine "DIABLO 3" which wouldn't lag to hell like DIABLO 2(central server), but at the same time not incur the data hacking of DIABLO 1(client side).
:)
How's it work? Basically everyone stores OTHER people's characters on their computers. Its client side, but not on your side of the client
If you log on, and everyone contests your new found power, then likely you didn't get it by honest means.
There are conspiracy theory and stuff that would give this problems... But people marked as potential problems would be more closely monitored... And if you continued to be abusive, you get kicked out of the play group.
I went on to note that if there was some way to get some IP seed with IP lists, then everyone could connect outside of a central server.
When I heard of Napster, I automatically jumped and thought my idea could be adapted to file sharing, but lo and behold it was known for at least a year.
Lots of main idea things come off in tadem.
But there is definately a TON of GNUTELLA spin off uses... And most of them involve lowering the overhead to compete with corporate giants.
You're using the client's power and bandwith to lessen the dependence on a central server's
God spoke to me
as the saying goes, 'All models are wrong. Some of them are useful.
So assuming this model is useful, the question is: "useful for whom?"
That's not logarithmic. If every client node connects to a "super node," and every other "super node," then what you have is a two-level tree. Growth at each level is O(sqrt{n}), not logarithmic.
Chord, a p2p research project from MIT, is truly logarithmic. Go read their SIGCOMM'01 paper for an explanation of how their system works.
--Patrick
LimeWire now has Ultrapeers (and QRP) which offload the bandwidth work to powerful nodes. Leaf nodes use little bandwidth. It works very well by all reports.
Other Gnutella developers are working on adding this as well. Next, we will go beyond the Gnutella broadcast model and use more focused queries.
If you are going to criticize a paper, do so on the basis of what they are claiming (there is no shortage of support for the claims he is making), not with conspiracy theories about the author's motivation.
Plus the author ignores (mostly due to the fact that they didn't exist back when it was written, this IS an old article) the innovations made with Gnutella (and other, newer competing technologies). Specifically, there are now 'search proxies' that exist on Gnutella that cache and return common queries, thus not saturating the network with redundant queries. For a modem user, this makes the network usable if they limit their connections to proxy servers, because the number of searches hitting their client directly shrinks as common queries are sifted through.
Not to mention there's still room for improvement to the protocol itself-- there's no reason a proxy couldn't cache a list of all files shared by a connected client, then answer queries directly, NEVER sending a query directly to a client. (Ultimately, as people run proxies like this more and more, you'd end up having proxies talking directly to eachother.) The ultimate Gnutella proxy would cache commonly requested files and make them available over a bigger pipe.
No money in it, but for the Gnutella enthusiast, I could see them running this kind of thing from work off of a QA box, for example, or from their support desk at an ISP. =)
All I know about Bush is I had a good job when Clinton was president.
A good analogy might be a detective trying to find a suspect for a crime. The Gnutella approach is akin to going on TV and asking everyone in the area to let you know if they know who did it. It may work once, but the more you do it, the less effective it is. Freenet works as detectives do normally, they gradually home in on their suspect by gathering information, and using that information to refine their search.
Some say that Freenet only achieves this scalability because it doesn't do the type of "fuzzy" search Gnutella does. You need to know exactly what you are looking for in Freenet to find it. This isn't true, the Freenet searching algorithm can be generalised to allow fuzzy searching. While this has not yet been demonstrated in practice, it is definitely possible in theory.
It always amazes me that people continue to lament flaws in many current P2P architectures when Freenet has incorporated solutions to those problems almost from its inception.
Disclaimer: I am Freenet's architect and project coordinator, so you could be forgiven for thinking I am biased, but you are free to review our papers and research to decide for yourself.
Obviously you haven't used GNUtella for the past year. Xolox is a GNUtella client that allows for parallel downloading, resuming, and Xolox will even look for other sources of the file that you are currently downloading, if the current sources are too slow or down. Basically, with Xolox, you search for a file that you want, and you get results with numbers by them depicting how many sources have the file. That way you don't have to decide which source you want to download from. You decide which file you want to download... and Xolox figures out the rest.
My average download speeds on Xolox are around 160Mbs. Of course, I am use the ever so crappy AT&T cable modem service... so other people on faster DSL lines will most likely experience faster downloads.
Next thing you are going to tell me is that Windows is better than Linux because Linux doesn't have any good GUIs or desktop environments for it. Yeah, lets just ignore everything thats out there right now.
Not only that, but Limewire also supports multisource, segmented, or swarmed downloading. Though Limewire has only recently gotten such functionality, while Xolox has had it for the past year.
Oh, and GNUtella is free as in beer and as in speech.
Seriously. The latest version (2.1) seems to have solved quite a few of the problems outlines in the 'study'. Anyone who is doubting the scalability of the protocol should give it a try.
-- Give me ambiguity or give me something else!
well hey... did DOS ever crash on you? its got amazing uptime. don't diss all M$ products.
Is it just me or is everyone missing the big picture here?
When I do my search for whatever the heck it is I don't expect or want 10 million results. Searching every user is almost always pointless.
If you actually step back and think about it a distrubuted network like this functions perfectly as a few thousand overlapping smaller networks of 2 or 3 thousand users. This way each person's own mini-network is centered around him.
A excelent side effect of this is it makes the content self filtering as dud material won't propagate far. This is because the user will delete it once they realise its a dud, hence stopping it moving on into neighbouring mini-networks. With a big centralised network like napster all it takes is a few extreamly fast clients offering decoy crap to much everyone around.
Purhaps if no result is found in the immeadiate mini-network then the query could be passed a bit further away, but you would never need to query every last person. Its highly unlikely you would be looking for something so rare that only one person in a million has it...
-trapper-
Well, I would find that funny. Of course, you don't really need to be an economist to know the new napster will fail, just like you don't need to be a computer scientist to know the Gnutella network was fucked (at least in it's original conception)
autopr0n is like, down and stuff.
Limewire is probably one of the most popular Gnutella clients out right now. It takes the best features from Morpheus (supernodes, downloading from multiples sources) and implements them into Gnutella. The original gnutella protocal would never have scaled, but Gnutella has evolved and is very much different from what it used to be. Limewire can also be touted as one of the most successful Java applications made today. Definately check it out, it's quite impressive. Oh, did I forget to mention that it's open source???
No. You cant download when all your upstream is being used.
If there was bandwidth capping as the default this would help. Also need to fix resume. Basically, put a decent client that has QOS built in by default, and can resume files from multiple sites. I never had a problem uploading, but when I want a file on a modem, and only 2 have the file I want, its mostly likely 1/2 way during the download, the user will log off. I have a directory of incompletes that never get resumed. Also, I have to connect to a large (again) LARGE amount of hosts to find the file I need. Its like finding a needle in the haystack. This is where a directory service like napster kicked ass. Finding the file.
But then if you want britney spears mp3s you will find thousands of hits...
anyone browsing this low, and paying attention, check out Flaming Lips and/or Bomfunk MC's...
Has anyone considered that a transparent proxy might be the solution, or at least a partial solution?
The internet is more of a tree than a net, at least for the smaller ISP's. So a site can run a transparent proxy that aggregates all it's gnutella clients, and only maintain a few outbound connections for the entire site, as opposed to a few per client. In addition, incoming gnutella connections are intercepted and directed at the proxy (which is essentially another gnutella node).
This allows ISP's to limit the number of gnutella connections to the rest of the world. In fact, it would be best for them to connect only to other ISP's using a proxy as well.
This would tend to greatly improve query response time for nodes that are close by, but on the other hand would make it harder to create connections to remote nodes, because that control has been moved from the client to the proxy.
But an office or an net cafe or school could run the proxy and have a single link between it and the ISP's proxy, instantly connecting the site with all the ISP's users and cutting bandwidth considerably.
Proxy's can do other things to accelerate searches. If a request for "Grateful Dead" has been forwarded, then there is no need to forward the same query string in the immediate future (say 1 minute). And of course the is the option of caching the file transfers themselves...
Of course, building an indexing system that scales arbitrarily is difficult, and building an indexing system that recognizes local topologies is also critical. A typical problem universities had with Napster was that if N people at the school wanted a given tune, most of them would be likely to fetch it across the school's limited outside bandwidth instead of most people fetching it from other sites on the fast LAN after the first one or two had downloaded it across the limited part. Napster was able to reduce this problem, at least at some schools, because having a centralized indexing service means that they can enforce more locality by making it easiest for people to find nearby peers. A decentralized system *may* be able to accomplish this, but it's a lot harder.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Or if you're looking for something more complex, you'll get better results by checking more places. For instance, I once searched Napster for every recording of a given Irish folk song - the versions done by the Chieftans got lots of responses, but some of the other bands who'd recorded it only got one or two, and they were performed entirely differently. Or if you're looking for live Grateful Dead performances, used in the paper's example partly because sharing them is legal, you'll probably find most of the albums on one music-sharing net or another, and the few hundred or a thousand best (or best-taped) shows they did, but you may be looking for that random show you attended in 1971 to compare how they played Dark Star with how they played it a few years later and to see how much of your memories were affected by the mood you were in (ok, or the drugs you'd been taking :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
two reasons:
1- The paper was written by someone at Napster. That's like someone at Ford writing a paper on how Chevrolet passenger transports can't scale.
2- The math has been reviewed by people (at least here) and found to be flawed.
So, it's probably a nonexistent problem, and the fact that gnutella keeps working and the whole internet hasn't slowed to a crawl because of it, is proof that maybe there's no solution needed.
Ok, I'm convinced that I should stop using the Gnutella network. But what should I use with OS X if it's not LimeWire?
Actually, I was trying to be Insightful, not Funny.
That's a good idea. There would probably be legal problems for the ISPs in the contributory/vicarious aspects, but technically, that would be a good improvement.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
What actually will a distributed network do for business
A business depends on relyable information and information sharing. If anyone can add info to the network to be searched through then it wont "fly" inmost businesses.
Why, because any business that has any quality assurance, needs a relyable information source and needs to know that source and it has to be controled.
I ran across a problem in this in trying to convince some clients to supply a company I worked for with CAD files instead of paper if they had them. I was told getting the files wasn't the problem but they were goverment agencies and making sure they were the current controled drawings was. They all told me if I could solve that then they would do it immediately as it would greatly reduce the load on hard copies they had to make.
In the engineering world if you wanted a drawing of some item and could go into a Gnutella like client and do a search for it would have been great. It would save time in redrawing it yourself, which could be considerable hours depending on what it was. But the question is where you got the drawing and how can you be sure it was current and accurate. There's no quality checks. Even if you knew the source how did you know it was the most current. If all the engineers/drafters that were connected to the network had a client on there computers what if the drawing they had was out of date.
Controlled networks with proper TTLs and maximum hops can work fine despite these numbers. But quality of info still needs to be determined and examined.
In the end I kept coming up with a single or system of central repositories to check against but seemed to contradict the thinking of a distributed network with no central database to check against.
From above, a whopping 1.2 gigabytes of aggregate data could potentially cross everyone's networks, just to relay an 18 byte search query. This is of course where Gnutella suffers greatly from being fully distributed.
Actually, I think the RIAA suffers more, since there's no one to sue.
While we're at it, let's give anti-kudos to jrp2 who submitted this fine repeat. Where's the friend/foe button so I can ding him too?
Have you tried Morpheus from http://www.musiccity.com/ ? It has auto-resume, search, multiple sources, preview-play, etc.
It searches pretty fast and will link up all the available downloads to give the guestimate on download time/etc.
I haven't found much wrong with it that I would stop using it when I want to find something. Just about everything can be found.
-Corey
It's worth noting that giFT/OpenFT just entered its first stage of network testing--and with that in mind, they need as many people as possible to download and run the client so they can test the network. Complete instructions for so doing are given on the website.
Editor Emeritus and Senior Writer, TeleRead.org
from a technical purely perspective, the supernodes idea makes a lot of sense. but wouldn't that just give the RIAA a spot to attack to shut down the network?
1) Windows networking only works on a large scale because the system is hacked to have centralized WINS (name server machines) win the elections. That brings us back to a "Napster" model instead of a true p2p model.
2) Windows network browsing is known to fall apart when machines don't handle the election process properly and end up declaring themselves master browsers and broadcasting false information (certain versions of WfW and 95 for example). Thats 1 vendor with 2 implementations of their own protocol that was buggy for years.
3) Microsoft legacied all of this 80's OS/2 stuff in favor of a pure centralized system in the form of DDNS and ActiveDirectory. Maybe because it was designed for small networks and never really scaled without lots of handholding by admins with the proper toolsets.
The "bandwidth" is distributed too! What you have is the N Gb "bandwidth" is distributed among the links of the network. Some take a heavier burden, some don't.
Try to use this guy's logic for your LAN and see how much "bandwidth" you have or how many users you can reach. :-)
His article is similar to the following:
Cars suck because they don't scale, if we build them to carry more than 1000 people, they get too big and heavy for the materials we build them with to hold together.
If we drive them faster than 400km/h they are too dangerous for the roads, and use too much petrol.
But, if you use a car normally, they are fine. Just as gnutella is fine if you keep hosts and hops to a reasonable level, not the silly settings the author of the article assumed.
Not if you couldn't predict which machines on the net would act as those supernodes. If, like another poster mentioned, machines that met a certain criteria (bandwidth, storage, time on line, whatever) simply won an election to act as a node, there's no single point to shut down. Shut down one supernode, the others are informed that a replacement is needed and another election is called for.
:)
Following an election, the supernodes update the clients as to the lookup machines. I suppose you could even have it where if all the supernodes were shut down that an entirely new election process takes place creating a new set of supernodes. Kind of like having a DNS server setup where any machine can act as one of the root servers based on a criteria based election by those machines doing a lookup.
Way too much for my wee brain to work out all the details on. Sounds good in theory anyway
The line must be drawn here. This far. No further.
This sig under construction. Please check back later.
Does EDonkey2000 suffer the same scalability problems? I tried it recently and other than some difficulty (small) connecting to servers i thought it rocked.
I must have my filter set too high.
Why do I never see anyone say the obvious? Right or wrong, I feel less liable if I only dl, not ul. Yeah, I doubt anyone would go after me for sharing a few songs on gnutella. But I really doubt they'll go after me for grabbing a few songs from gnutella.
I'd be glad to share all day long. Sorry, just can't take the risk. I don't want to have to explain to my wife that we lost our broadband, or got fined, or are getting sued or something because I wanted to play nice on gnutella.
In KAOS, my OS blah blah blah, Samizdat is the protocol for fileswaps & communication.
You can download from the closest or fastest copy of the file as it takes note of your files and their metadata to a local server which keeps the user lists handy for faster searching.
This is being designed in Java2 and will be native in KAOS, etc.
May come to OS X, XP & Linux native later.
- Kaos games and encryption systems developer
Retard? right. This guy is less than a retard.
Makes you wonder what the dork is doing here on
Slashdork, and even more shockingly, how did he
get into college ?
Hey monkey brains. Why don't you donate that
computer of yours to an orphanage? I'm sure
those kids will put it to much better use.
a paper written by one of Napster's founding engineers
Just when they lauch their pay service. No, I assure you his/her analysis is totally and utterly impartial. Excuse me while I ask Bill Gates about the scalability of the Linux kernel.
my god. this paper is literally YEARS old.(over 2, maybe 3) the idea behind slashdot is good, but the implementation sucks.
-teknopurge
The next step is to add more sophisticated routing protocols between ultrapeers. Many of the algorithms mentioned elsewhere in this post (Chord, CAN, etc.) are contenders for that, as is LimeWire's home-grown query-routing proposal.
Christopher Rohrs
LimeWire
If there is one thing this reminds me of, is that the limits of transmitting "bits" is a real thing. Why we are trying to broadcast bits and bytes is beyond me, I see no evidence of this ever becoming a high quality, practical medium. Isn't the Internet really a lot of hype? Other than transmitting basic text around it is not much good for anything heavier. IMHO.
Don't start in with 3D online games. My point is, after my 10 years of TCP/IP, Web, Streaming, Gaming, etc. the Internet and Computers in general still look to me like the most clumsy, expensive, wasteful, way to transmit information ever created. And it all looks like shit still. Hey, Maybe you like watching movies on a 14" screen with jumpy pixilated images, I don't. The real tragedy here is that you spent far more money on it, even if you got it from some warez spot online.
The world has been taken over by aliens who look just like little TVs with keyboards. Their mission, to waste time, present images so poorly eyesight worldwide will be strained, destroy as much information as possible and when that fails - change it, pollute by dumping masses of oil based plastic, bubble the economy to outrageous proportions - causing world monetary chaos, then control information by offering online shopping.
Mission successful!
A while back someone told me about Mojo Nation and it seemed pretty nifty. The idea is that users get "mojo" for contributing to the network as a whole, which they cam then "spend" on services that other users provide. It's trying to solve the problem of "freeloaders" on networks such as Gnutella who use lots of bandwidth with searches but basically bog the network down. Capitalism meets P2P. It's kinda-sorta decentralized in that files that are shared are broken down into little (redundant) bits that are stored for retrieval on lots of hosts. Again, neat idea, but the problem I had with it was that it cost mojo to share files! It costs mojo to do anything that uses bandwidth, basically, so users are implicitly discouraged from sharing. Oy vey.
"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
Oh care about Gnutella scalablity or lack of. There are other programs such as Morpheous and Kazaa that have scaled up to a decent size. Besides why pay for Napster or Real One and a limited selection when you can most the songs you want on free service.
you don't need the "i++"
"I'm just here to regulate funkiness."
Taco's not a patent examiner.
Population: 12,974,361 (July 2001 est.)
Age structure: 0-14 years: 42.11% (male 2,789,189; female 2,674,747)
15-64 years: 54.25% (male 3,518,209; female 3,519,851)
65 years and over: 3.64% (male 220,640; female 251,725) (2001 est.)
Population growth rate: 2.6% (2001 est.)
Birth rate: 34.61 births/1,000 population (2001 est.)
Death rate: 6.79 deaths/1,000 population (2001 est.)
Net migration rate: -1.84 migrant(s)/1,000 population (2001 est.)
Sex ratio: at birth: 1.05 male(s)/female
under 15 years: 1.04 male(s)/female
15-64 years: 1 male(s)/female
65 years and over: 0.88 male(s)/female
total population: 1.01 male(s)/female (2001 est.)
Infant mortality rate: 45.79 deaths/1,000 live births (2001 est.)
Life expectancy at birth: total population: 66.51 years
male: 63.85 years
female: 69.31 years (2001 est.)
Total fertility rate: 4.58 children born/woman (2001 est.)
HIV/AIDS - adult prevalence rate: 1.38% (1999 est.)
HIV/AIDS - people living with HIV/AIDS: 73,000 (1999 est.)
HIV/AIDS - deaths: 3,600 (1999 est.)
Nationality: noun: Guatemalan(s)
adjective: Guatemalan
Ethnic groups: Mestizo (mixed Amerindian-Spanish or assimilated Amerindian - in local Spanish called Ladino), approximately 55%, Amerindian or predominantly Amerindian, approximately 43%, whites and others 2%
Religions: Roman Catholic, Protestant, indigenous Mayan beliefs
Languages: Spanish 60%, Amerindian languages 40% (more than 20 Amerindian languages, including Quiche, Cakchiquel, Kekchi, Mam, Garifuna, and Xinca)
Literacy: definition: age 15 and over can read and write
total population: 63.6%
male: 68.7%
female: 58.5% (2000 est.)
Source: CIA
The only solution is to structure the network by using "super clients" or "servants" or "super nodes"[...]
But won't this "super singularties" become, on the long run, bottlenecks themselves, prone to abuse, DoS etc., plus the logical target for the "other side" that wants this kind of p2p to be buried and forgoten?
One of the strenghts of the p2p model is that is hard to pursue 1000's of (arguably) minor copyright infingements as opposed to charge one entity (Napster?) with all of them...
-- No sig today
A solution has been proposed that is a hybrid of Napster and Gnutella; basically, it is a Gnutella-like network of volunteer-run Open-Nap-type servers. Most users would run as an end node, which would merely query, upload and download; a few, however, would run index servers, where all the searching would take place. End nodes would run as a client/server relationship to one (or more) of the indexing nodes, each of which would network with a few other indexing nodes. The result would be a file-sharing network nearly as efficient as Napster with the robustness of Gnutella.
For anyone else insane enough to care, here are simplifications and closed form solutions for some of the quantities in the article (among other things, this puts as many things as possible in terms of g(n,t)) :
f(n,x,y) = n * Sum[(n-1)^(t-1), t=x->y]
g(n,t) = f(n,1,t) = n * Sum[(n-1)^(t-1),t=1->T]
Closed form: g(n,t) = ((n-1)^t - 1)/(1 - 2/n)
h(n,t,s) = S * g(n,t)
i(n,t,s) = 2 * h(n,t,s) = 2 * s * g(n,t)
j(n,T,R) = f(n,T,T)*R
k(n,t,R) = Sum[T*j(n,T,R), T=1->t]
Closed form: k(n,t,R) \ R*((t - 1/(n-2))*g(n,t) + n*t/(n-2))
Bottom line translation for high school dropouts and other wankers:
k(n,t,R) ~ R * t * (n - 1) ^ t
It seems the author has missed the uniqueness of the gnutella protocol when comparing. The readings show nothing but the obvious, while the potential for gnutella as such is huge when these disadvantages are removed. It is as simple as this. A person can choose the city he wants to live in. Once chosen, he can visit the grocery store in his locality. e.g If he stays in Chicago, he doesn't need to go all the way to San Francisco to buy groceries. Similarly simple solution for gnutella: Organize the gnutella network.