Running The Numbers: Why Gnutella Can't Scale
jordan (one of the founding developers of Napster), writes: "As the rumour mill churns over Napster's future, many folks see Gnutella as the next best hope for the music loving file sharing community. Problem is, Gnutella can't scale . [Note: if that URL doesn't work, try this mirror.] Almost all research on Gnutella up till now has been based on observations of the system in the wild, but this paper discusses the technical merits of that statement through a detailed mathematical analysis of the Gnutella architecture." The kind of numbers that you may not like to read if you figure networks expand to accomodate traffic at a never-ending pace. Update: 02/15 12:24 AM by T : Jordan also points to this mirror for your reading pleasure.
The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no". I did my Ph.D. research on it. It works. Gnutella is broken, but don't draw a conclusion that a server-less environment can't scale. Read before you post this crap.
then the RIAA could easily make the case that you were storing illegal content on your machine
No, reread the parent. You cannot know where the information is actually stored.
A freenet node is basically a caching router, and AFAIK even the RIAA hasn't yet been able to repeal the common carrier status, so you should be ok.
Cui peccare licet peccat minus. -- Ovid, Amores.
First off Napster is to be praised for its ability to find some rare or bootleg tunes. BIGTIME props to Napster for that.
Bottom line though is you people seem to forget what it was like in the good ole days for us to pioneer this CRAZE that swept the net. I feel like I should be talking to (Grandkids here saying this) "When I was your age we had to search the web for FTP servers and download them the old fashioned way."
"I recall having access to a T1 at work when only the elite few had that and was running an MP3 site boasting 1 gig of tunes on a SCSI HD that was in a STATE OF THE ART P150 Dell server( I now have close to 20 gigs of MP3's)"
Sure Napster is/was great Gnutella although will continue to be trouble...We will all make it.
BTW if anyone wants to contact me, I will happily workl with you to upload my collection if you wanna open a site somewhere.
The argument of college bandwidth, alhthough many will hate me for saying it, is legit. I work for a company that installs network management softweare especially to Universities and the ones that have blocked Napster have seen a substancial amount of traffic drop. I do not know what the answer is, I can say I know several gamerz that HATE Napster etc for the amount of bandwidth they lose on campus. Poor guys probably have a Ping of 27 instead of 21
Razzious Domini
Razzious Domini
I could be a GREAT KARMA WHORE if I could just shed the few morals I have left.
I am nearing completion of a network that satisfies a, b, c, e.
I havent started on d and f, but they could be added.
This project is called The ALPINE Network
It scales linearly, and provides a query mechanism that rivals the performance of a centralized directory. (Although the bandwidth is more than a centralized query, but at least you have direct control over how much bandwidth you use and how).
At any rate, I could use development assistance a great deal. Let me know if anyone is interested.
Regards...
1. How do you identify all the peers?
Thats discussed on the site I mentioned, but essentially you each pick an ID to associate a given peer with. Its that simple.
2. Let's say 10% of those 10K people are doing searches. That saturates a 56K modem, assuming you can really get your packets down to 56 bytes
It would only saturate your link if all 10,000 searched at once. If they all searched within a 3 minute time period, or no more than 70 in one second, your link will not saturate. And the packet is 56 bytes for an 8 character query. For a 16 character query, it would be 64 bytes. etc.
What happens when you try to have 100K people? One million? How about the 10 million+ of Napster? Your scheme would not scale.
That depends on how good of a peer you are. If you dont repsond much, and have a very low link, then you will be at the bottom of those 100,000 hosts query lists, and will get queried infrequently. I cover this on the site, but this is not a problem. The only thing that is limiting your use of the network is how much memory you have (you would need a hundred meg or so for a million connections) and your bandwidth.
Ok, but now OpenNap basically just utilizes the Napster paradigm and therefore puts into place Index servers.
If the RIAA succeeds in suing Napster and blocking their service, which seems very likely at this point. It is not at all far fetched that they will easily be able to receive court orders against anyone else running the same time of service.
So your OpenNap is not a replacement service because every index server is liable for a court ordered shutdown.
That and the index server requires bandwidth, bandwidth costs money and how many people are going to donate full T3 lines to this? Thus the service is capped in terms of the number of connected users based on bandwidth available.
Once Napster is dead, there will be nothing else to replace it at the same scale unless it is operated with the blessing of the RIAA.
Cui peccare licet peccat minus. -- Ovid, Amores.
No. The implication is that it's a series. The goal is to figure out what the progression is, and then come up with the next in the series.
Yes there is. He looks at aggregate traffic numbers, rather than per-client or per-search numbers. Saying that a search creates 6 GBytes of traffic sounds scary and un-scalable (Table labeled Bandwidth Generated in Bytes (S=83, R=94)). Holly cow, that's a lot of data. Now, table "Reachable Users" reveals that that 6GB of data is searching 7.6 million clients. If we do the math, we find that our traffic level is a little over 800 bytes / client searched (including responses.) Is 800 bytes of traffic for a search unreasonable? I don't think so.
All the author really does is take an example of a mathmatical formula which grows exponentially and show how quickly he gets "scary" numbers. No effort is made to show whether or not the efficiency of Gnutella breaks down as the network increases in size. No effort is made show how much work is done per search or per result. He just makes assumptions about the gnutella network which results in exponential growth in the number of users, and then shows how the aggregate traffic also grows exponentially. Duh. What did you expect? By this logic, nothing scales.
Don't get me wrong, I don't think Gnutella scales either. But you don't need to wave around all the FUDdy math that this guy does to prove it. The argument why it doesn't scale is simple:
The reason is doesn't scale is that every search request (optimally) gets delivered to every client. We don't even have to look at how those searched get delivered. We'll completely ignore the amount of traffic in the backbone, and only count the traffic that has to exist on the last hop to each client. Let's assume that the requests are 100 bytes a piece, or about 1000 bits once we have all the overhead of UDP/IP/ethernet/PPP/ATM/whatever on top. If each search is 1000 bits, and the average client has a 56K modem, the whole thing falls apart when the search rate is 56 searches / second. If we assume 1 million users, each one can only perform a search about once every 5 hours on average before the modem links are 100% full.
The problem here is the broadcast of every search to every client. Any distributed search network needs to either assume very high bandwidth connections for all the clients (because they are all servers to the whole network) or have some hierarchy of caches / servers. The amount of bandwith being used at each client increases as more clients connect. If the number of users goes up by 1000%, the traffic on my local link goes up 1000%. This is why it doesn't scale. It has nothing to do with how many GB of traffic the network as a whole has to handle. It's the simple fact that the traffic at every client increases as more clients connect. This is the problem that has to be corrected, and Jordan's paper never even mentions this fact, relying instead on big scary numbers. His claim at the end that gnutella generates 2.4GBps of traffic for 1 million users is the ultimate FUD. How much traffic does Napster generate when it has 1 million people connected? He probably doesn't know because their servers go down first.
Of course, you'd have to work out how to prevent hostile clients and servers from corrupting your indexes, but I'm sure that's a much more easily solved problem than working out how to prevent some skript kiddie from flooding napsters servers off the net with a DDOS.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Get a bunch of investors to run a big fat network pipe into a country with a name like "Ljubilvaniastanistan" where the rutabaga is the national currency and the yak is the national delicacy. Then watch Hillary Whats-Her-Name from the RIAA swallow her own tongue when she learns that her vast legion of lawyers are powerless to do anything about it. Of course, the Bush administration would probably order immediate airstrikes on the grounds of "protecting the wealth-creation security of national corporate interests", but that would be a public-relations nightmare, particularly if we put the new Napster right in the middle of a bustling village full of non-coms.
In all seriousness, I don't condone mass piracy, but the RIAA has been screwing people for decades and I have to admit that I enjoy watching them squirm. What could the RIAA conceivably do if Napster were located offshore, preferably in a country not bound by the terms of the Berne convention?
We're going down, in a spiral to the ground
If you have a server on the internet which you want people to connect to, it's got to be advertised somehow.
Won't be hard to locate them.
If the RIAA can get Congress to pass a law which places a substantial fine on those convicted of running internet services for the purpose of piracy... Which isn't that unlikely. The DMCA places something like a $1 million fine on creating tools to subvert copy protection.
Who will be able to afford to risk running said service? Bill Gates, maybe Larry Ellison. Doubt that'll happen.
Isn't this the sort of thinking that went into the creation of the internet in the first place? The idea of a decentralized network, etc. etc.
Maybe Gnutella needs to take the meta-internet approach. A "new" internet on top of the current internet?
(I dunno. I ask because I'm curious. How is Gnutella in general different than the internet in particular?)
And nor does the average user, and therein lies the rub. As long as the interests behind RIAA are smarter than the vast majority of users (they are) you can be quite sure that RIAA will stop rappant piracy. You can say "But but there's always ", normally it's ftp, usenet, etc? The simple fact of the matter that most users don't have the time, the energy, or the intelligence to figure them out. The only reason that piracy has been as popular as it has been is because Napster lowered the bar suffiently low, it brought fast and easy piracy within the reach of a few keystrokes and mouse clicks.
You know, I have to take some exception to being called "sheep" because I buy CDs. Do you honestly, deep down, feel that because the current RIAA-supported distribution model doesn't compensate artists fairly, you are striking an idealistic blow for artists by using a model which, by and large, provides no compensation at all?
Reality check: if you download music copied from a CD sold by an RIAA-affiliated label, you are not "boycotting RIAA-sanctioned music". Boycotting means you are willing to go without a product on principle. That's not what you're doing. What you're doing is, at the least, legally considered stealing the music (presuming you don't own the CD already or buy it later)--and I'd have to say it's philosophically pretty dubious. If you didn't "just want the music," you wouldn't be getting it for free.
If you want to boycott the RIAA, you have to support artists who make their work available through "non-RIAA-sanctioned methods." But trading their music for free through Napster is not support.
It's easy to defend Napster for what it might become. I think digital music distribution is coming, soon, and I suspect it will live without the RIAA. But it will require a viable business model for the artists, not the record companies, that allows an average, "second-tier" artist to get equal or better compensation than they would from a record company and provides a reasonable level of promotional support for concerts, merchandising, radio airplay, and the like. Napster does not provide this model. A future model might be free as in speech, but currently Napster is unequivocally free as in beer, and we're not doing ourselves or anyone else any good by pretending otherwise.
The problem with Napster is that it has a single point of failure. The problem with Gnutella is it doesn't have an index. What you want is an index of all files with no single point of failure.
An index is a root node, which points at branches, which points at leaves. So make 10 copies of the root, 10 of each branch, 10 of each leaf, and put each on a different transient machine. (If you think 10 roots is too few, have every user keep their own copy of the root. It's not big.)
Then here's your protocol: Ping the roots one at a time, choose the first that responds. The chosen root pings the duplicates of the correct branch one at a time, chooses one. The chosen branch pings the duplicates of the correct leaf one at a time, it chooses one. The leaf sends the results back to the user.
Updating the structure is the same, with the addition that nodes occasionally try to sync with their duplicates. You end up with duplicates never quite in sync, but so what.
No, FidoNet requires a 'devlivery' or 'bulk transfer' protocol.
The protocol used by ALPINE is for messaging. The types of broadcasts are very small packets. Usually 50-60 bytes. This makes a huge difference.
MyopicProwls
MyopicProwls
My homepage
-jon
Remember Amalek.
You've apparently never used a news client since 1992 or so. These days all of the collating and uudecoding is done behind the scenes. Just select a file in Pan and press "D". In fact the Usenet is a great way to distribute Fansubbed Anime without overloading any particular server.
Now the problems (that havn't been mentioned yet): data on the usenet has a short lifetime, frequently less than 24 hours. If you don't keep on top of it, it is easy to miss things (like the fourth episode of a series). Second, you can't search out a particular song on the Usenet, you have to more or less take what is available. If you are looking for a particular song, the Usenet may not be for you (although you can certainly request it).
Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
I read the internet for the articles.
I really disagree. First of all, Napster unquestionably provides a distribution model that provides a reasonable level of promotional support for artists. It's really great how many new artists I've discovered because of MP3s. Not necessarily just because of Napster, but if a friend says (as often happens) "hey have you heard this new stuff from Boo Williams?" and I say "Boo who?" (no pun intended -- go download his music it's great) then all of a sudden Boo has an opportunity to have his music heard by someone who never would have heard it otherwise. Super!
Thankfully, MOST of the artists that I listen to have come down on the PRO NAPSTER side. This includes Ben Folds Five, Green Day, Limp Bizkit, The Offspring, Chuck D, and others. Unfortunately, some of my favorite artists have come down on the other side. These include the most vocal three: Metallica, Dr. Dre, and Eminem. That sucks.
MORALLY I get over the problem. Is it morally wrong for me to want free music? I don't think so. Is it morally wrong for an artist to produce work that I listen to for free, never buying his CD, never going to his concert, never buying his T-Shirts? Perhaps. Perhaps not. But certainly it is no worse that middlemen becoming so ridiculously rich by screwing me with $18 CDs. CDs should be between four and six dollars; about half should go to the artist (about what they get now, or a bit more).
Trust me, I sleep just fine at nite having spent the whole day listening to MP3s. And I do own CDs -- oh God do I own CDs. I counted once, and I probably gave the RIAA $4,000+ in my (short) lifetime. That's a lot. A whole lot. I figure they are still $3,500 ahead after all the free downloading I've done.
I also, by the way, have 'discovered' artists via MP3s and Napster, and subsequently bought their CDs and gone to their concerts (e.g. Cypress Hill and Lavay Smith -- don't laugh) so those artists are definately ahead because otherwise they wouldn't have seen any of my money at all.
MyopicProwls
MyopicProwls
My homepage
Yes, of course. Gnutella makes only the most limited attempt to ensure privacy. Case in point: a while back (in fact, it still may be running) a server returned fake matches to requests for kiddie porn, and published the IP addresses that had been caught trying to download the "files" on a webpage. I don't have to tell you how indignant some people got, and for the funniest reasons.
The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
I even share the equations and methodologies I used, and try to poke holes in my own conclusions.
Further, I'm not a competitor. I haven't worked for Napster in 3 months. Before Napster my background was in poking holes in things anyway. All I did was finish a personal project I started a long time ago.
You actually sound more like FUD than anything. :-)
--jordan
The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".
Not so fast. Right now, the biggest problem with decentralized networks is that they all have some form of routing/forwarding. If you got rid of routing/forwarding, then they could scale.
For instance, lets say you have a napster style peer group, 10,000 peers. What if, to query these peers, you sent a small UDP packet to each of them directly? No routing, no forwarding. How long would this take?
Modem: 2.5 minutes
DSL: 13 seconds
I would say that this is an acceptable period of time. And the bandwidth used was all your own, nobody elses, except for the 56bytes each peer received for that single packet they got from you.
I am working on such a network, its called The ALPINE Network and has all the features mentioned.
So, if you get rid of the forwarding/routing you can have a decentralized network that scales linearly.
You've still got to query each peer, and a linear search like that isn't acceptable when you've got a lot of people hitting a big database
There is no big database. There are lots and lots of little databases.
And the network adapts to load. I go into this somewhat on the site.
The important thing I want to point out is that this network is used mainly for locating content. Once you have found it, it may reside within Freenet, it may reside within OpenCOLA, or MojoNation, etc. And then you will benifit from their architecire for the actual delivery of the data.
The broadcasts are used solely for discovery of resources, with the delivery being a whole other scenario. UDP is a horrible bulk transfer protocol.
No, because you control exactly how often or how much response you provide.
If you are getting swamped, you will respond to less and less queries, and then your quality in the eyes of those peers will drop, thus, you will receive less and less queries.
This is actually a balanced type of configuration, which handles load in an efficient manner.
Also note that over a DSL line, you could receive in excess of 10,000 queries a second.
These big scarry numbers actually look very very close to what normally a network analyst would predict for Gnutella. Gnutella network will display network slowdowns with increasing number of active nodes, that is true simply due to a fact that the networks have limited resources, the physical networks will stay the same, the software running on them can concievably bring the physical networks down. Caching data is a good solution for Gnutella but note that it is only good if you use a client that does caching and note that Internet users generally don't like sharing their own resources (I mean their bandwidth) with the neighbours.
You can't handle the truth.
What they could prove is that you transmited copyrigted material, regardles of wether or not you actualy stored it on your machine or not. And thats the problem. You can't get in trouble for downloading or storing files, its the uploading that get's ya.
Amber Yuan 2k A.D
"and dear god does this website suck now." -- CmdrTaco
A simple query with an 8 character query string would be 56 bytes. The string above might be as much as 160 bytes.
I did some monitoring of the gnutella network for a few months, and the size of an average query is about 8-16 bytes at most. Many many queries where even less.
If the users are chained together through ids one hop at a time, then you would have to route and re-ruote a query for their ids before you even do anything!
No, you missed a major point; there is no routing and no forwarding.
This is what makes it so simple, and linear. You directly communicate with all the peers you want to query. Everyone directly communicates with each other. The only thing this implies is a transport service which can support a large number of concurrent connections efficiently. This is what DTCP does.
The big problem with usenet is the avialability of files. If I wanted to download Metallica's Master of Puppets, I would first have to see if it is there. If not, I would have to request it, and wait some time for it to appear on the newsgroups (and all parts are there). If someone uploads a 128k/s MP3, and I wanted a 192k/s, then I have to request & wait again. Napster/Gnutella avoid this problem by allowing me to search all of the music that is available, and getting the files I want right now!
Doh!
(If the people running the RIAA and MPAA had been clueful, they would have been pursuing this strategy against anonymous file sharing from the very beginning. If 99 out of 100 requests for insert-top-forty-song-here on Napster return William Shatner singing "Lucy in the Sky with Diamonds", then most people would rather pay for the CD than sift through all the false results. But I digress.)
--
send all spam to theotherwhitemeat@ropine.com
Why not using IRC ? I mean, it's there, it's reasonably reliable, and allows both centralized and P2P communication.
/notice the_asking_bot IVEGOT Bachelorette.mp3).
/join #davidbowie), thereby bringing people with common interests together. Technically, IRC networks are the best example of a semi-centralized-yet-free network I can think of.
The "client" would be a bot. It would join a channel (say, "#bjork" or "#trancegoa") and to make a request, it would simply utter something on that channel in some protocolish language (eg "SEARCH 'Bachelorette'"), and other bots would respond in a P2P fashion (ie
This would deliver us from the scaling curse as it is described by Jordan's paper. This would also lead to a Usenet-like classification of available files among channels (if you like david bowie, you would
Think of this: Napster was made as a sharing system, where people could chat. We have a chatting system. Why not allow people to share files on top of it ?
> I'm speaking practically here. I'm going to visit 10,000 cities. Please give me the absolute guaranteed best route (in my lifetime, if you please).
You are wrong. Either you are speaking theorically, in which case the salesman problem is trivially solvable, or you are talking practically, and you don't give a shit about the *best* route. A good one will be sufficient, and there are very good heuristics for that.
Cheers,
--fred
1 reply beneath your current threshold.
And the crap part, is that it's JAVA BASED.
*vomit*
You build something that uses a distributed algorithm to build a spanning tree. The nodes near the top of the spanning tree become the servers. You build the algorithm so that parents in your spanning tree will naturally have more bandwidth than you do.
I've been thinking about this for a long while.
Building the spanning tree isn't hard. Every node just selects one and only one parent node. They tell the parent that they're a child of that parent. You prevent cycles having a parent refuse to be a parent unless it also has a parent. If it loses its connection to its parent, it tells all the children that it no longer is a parent. One node 'seeds' the network as a root by saying it can be a parent without being a parent and not looking for a parent. Eventually it can delegate roothood to a child that has proven high bandwidth. It cannot cease being a root without doing this delegation.
You can have connections to nodes that are neither parents nor children, but search requests should not be propogated to those nodes unless you have no parent. Eventually a search request will make it onto the spanning tree and be efficiently distributed.
You can eventually elect servers who are near the top of the spanning tree. Nodes should, in general, elect parents that have more bandwidth than they do. This means that nodes near the top of the spanning tree should have the most bandwidth.
Need a Python, C++, Unix, Linux develop
I already had an intuitive grasp of what he was talking about, and his numbers seemed ballpark correct to me. I too thought the result set bandwidth numbers looked a little fishy, but the others seemed fine.
I've been thinking about this for months.
Need a Python, C++, Unix, Linux develop
I really want to build this with my StreamModule system, but nobody is helping me with it, and I don't have the time to hack it out, especially since I'm so ridiculously methodical when it comes to code.
You build something that uses a distributed algorithm to build a spanning tree. The nodes near the top of the spanning tree become the servers. You build the algorithm so that parents in your spanning tree will naturally have more bandwidth than you do.
I've been thinking about this for a long while.
Building the spanning tree isn't hard. Every node just selects one and only one parent node. They tell the parent that they're a child of that parent. You prevent cycles having a parent refuse to be a parent unless it also has a parent. If it loses its connection to its parent, it tells all the children that it no longer is a parent. One node 'seeds' the network as a root by saying it can be a parent without being a parent and not looking for a parent. Eventually it can delegate roothood to a child that has proven high bandwidth. It cannot cease being a root without doing this delegation.
You can have connections to nodes that are neither parents nor children, but search requests should not be propogated to those nodes unless you have no parent. Eventually a search request will make it onto the spanning tree and be efficiently distributed.
You can eventually elect servers who are near the top of the spanning tree. Nodes should, in general, elect parents that have more bandwidth than they do. This means that nodes near the top of the spanning tree should have the most bandwidth.
Need a Python, C++, Unix, Linux develop
Ok, this time I did a bit more thorough check of the numbers. I agree with the first half, the traffic generated by the request half of the message. What I'm not as convinced of is the response side of the equation.
I don't know what the typical percentage of Gnutella users sharing files is, so I'll accept your figure of 30%. But 40% of those sharing files having a match? Even with your reduced number here I think it's high. If 40% of people sharing files had a match that would mean with default settings you'd get: (N=4, T=5) 484*(0.3*0.4) = 58 people finding a match. And with the numbers you use later of 10 matches a person you'd get 580 matching entries. I've never received anything near that high. But if I did, I certainly would have no motivation to increase T or N.
What happens if it's only 10% of those sharing that have a match? With the default settings you'd still get 14 people matching, or about 140 matching entries. That's still a *lot* of responses, more than I've ever received.
If all your default numbers are used, your nightmare scenario would yield 0.3*0.4*7,686,400*10 "found" responses to your query. That's 9 million 223 thousand 680 "grateful dead live" songs (though not unique) shared among 900 thousand deadheads who are all simultaneously online. Whoa.
I'm not an expert in human psychology by any means, but let me suggest this. With most tools, people don't feel any need to "tweak" them unless they're not working right. With 480 songs returned, I don't think many people would feel a need to tweak their settings. If someone was having a hard time finding something they might then change their settings -- but if they were having a hard time finding it they wouldn't get so many responses returned.
The only way I can imagine those monstrous amounts of data resulting from querries is if it happens by maliciousness or mistake.
Am I missing something?
With Napster, the bandwidth usage from the query is negligable. A single packet (your query) goes out to a single destination (the Napster index server). A small handful of packets (your listing of places your desired song is located) comes back. A few K total, then you get your 4mb transfer.
With Gnutella, the bandwidth usage from the query is significant. Your query goes to several peers, which then forward it to other peers, etc... and each server with the song requested sends you back a packet. Looking at the numbers in the analysis shows that your query will quickly generate more bandwidth usage than the actual transfer (which you'll still have to do to get your song). The bandwidth hit is distributed, true, but it still adds up, and grows logarithmically with the user-base rather than linearly.
Gnutella's success depends upon a significant portion of its users also being servers (i.e. making files available for download) -- being a provider as well as a consumer. There's a server-side hit, too... with Napster, a provider of files sends a few packets to the Napster index server advertising its wares. Aside from the bandwidth usage of the actual transfers the provider is serving, very little impact. With Gnutella, every query within your range will hit your server. Bandwidth usage from queries will quickly outstrip bandwidth usage from transfers, and this will tend to discourage people from being providers.
Please, don't get me wrong here. I think that peer-to-peer will be the future, but there are problems to be solved. Gnutella, as it stands now, will not scale well... the math in the paper in question is good, and matches real-world observations. The challenge is managing the queries, routing the queries intelligently, and keeping the bandwidth usage down "below the radar" of backbone providers and system administrators.
I don't know what can be done about the bandwidth usage of the transfer itself, but keeping the query traffic down will help in keeping administrators and providers no more filesharing-hostile than they already are. Now is the time to be treating these people well, instead of antagonizing them further. You don't want to bite the hand that feeds you your bandwidth :)
This problem has been solved before, by the way. Think "routing tables".
But if the price of gasoline goes up, you can bet your last dollar that teleportation will be made practical. Or that cars that use fusion will be developed.
Not everything is practical just because there is a need for it.
Great straw-man rebuttal! How about if you try a more rational analogy? Going from gas combustion engines to teleportation or fusion power is a tad bigger leap than going from Napster to a similar service! And Napster ceasing to exist versus gas prices climbing higher is not analogous either...
A better analogy would be:
"If we run out of petroleum-based fuel, a similar or better form of energy will come to the forefront."
And that's ABSOLUTELY TRUE, reasonably proven through a huge mound of empirical evidence.
"And like that
Of course we've discussed this twice already here and here.
Someone you trust is one of us.
No one said that distributed P2P needs to be a Napster clone. Free software authors frequently make the mistake of just coping some retarded existing commercial software (witness the influence of Windows on Gnome and KDE). We need to try a lot of totally diffrent ideas too. Here is one example:
Your system plays the role of file server by offering a list of available file and plays the role of search server for you by collecting the lists of available files from diffrent people. The key here is that only you search your own system's database, so only you get taged for the cost of collecting the databases of too many diffrent systems. Clearly, your system needs to figure out automatically which nodes it should track by remembering where you actually find stuff, but this should not present any real problem. You would also introduce a little randomization by tranking random nodes for a limited period of time.
This might work just as well as Napster for people who always DL the same type of music (like Tech for me). Clearly, you would not be able to show off to your friends by DLing any song they request, but that is not really that importent.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
Reality Master 101 writes: Not everything is practical just because there is a need for it.
:) However, science says its possible to build a gnutella-like network that will scale. Therefore, we have NeoGnutella, which will be built if there is a big enough of demand, or OpenNap. OpenNap is, as we mentioned before, is easier to use, and simular to the Napster that most of us know and love, while NeoGnutella will have the benefit of never being able to be shut down. What will win? I personally think that both will survive, due to the fact that there is a large enough market to be divided up by 2 players (again, simplified example) but that OpenNap will probably grab most of the Napster fallout due to simularity to its commercial cousin. However, if OpenNap servers become attacked legally and thus often shut down, we will switch to the NeoGnutella because finding one "node" that we can persistantly connect to is a lot easier then refinding OpenNap servers, even if OpenNap seems to scale better then any distributed net solution, and even if OpenNap is more familiar. Therefore, the long term outlook for Gnutella depends upon if it will be adapted to scale, and if OpenNap will be attacked, as well as other issues not addressed in this rant. We all have different wants. OpenNap, Gnutella, Freenet, FTP/HTTP "warez" sites, IRC "warez" channels, Napster, (formerly) Scour, and other services have evolved to meet this need. Since Napster was the most appealing to most users (and because of media hype), it became one of the biggest file sharing programs out there. Now since Napster has a rocky future, another method will become the biggest.
:)
Warning: Rant Ahead!
Partially true. In your example, you said that if price of gasoline went up, teleportation or fusion-powered cars wouldn't be developed. I agree. However, if the price of gasoline went to $20/gallon tomarrow (an outrageous rate, but its just an example), then we'd either see a changeover to natural gas/electric or some other alternative energy source vehicle, or cars would be developed that got 400 miles/gallon.
So why would gas/electictric cars be implimented and not fusion or teleportation? Well, first we have a demand for transportation. The demand for transportation is rather high, at least in the developed world, and especially in the US, since all of us seem to want to live in the woods and commute to the city. Therefore, if the demand is high, we *will* find something to fulfill the need, as long as the cost of fulfilling the demand is not so great that we have to sacrific other, equally important demands. We don't commute to work via helicopters because the time, money, and energy we would have to exert to be able to use them isn't worth the extra few minutes we'd shave from our commute time. We don't commute to work with buses because we prefer living in areas with lower population densities (e.i. suburbs) which make buses impractical and we don't like the inconvience of having to conform to the bus's schedule and having to interact with other members of our community. We are looking for something that fulfills our need to get from point A to point B with the lowest oppertunity cost to us. This is the economics/social side of the scale. On the other side of the scale is the harsh laws of science and technology, which dictate what has been done, what is possible, and what is impossible, and what the costs for doing each are. Say we have a possible solution set such as this { car (gasoline), car (electric), walking, teleportation, car (fusion) }. Science tells us the teleportation looks impossible. Therefore, we eliminate it. Technology tells us that fusion powered cars haven't been done yet, and considering everything that we know about "hot" fusion, its doubtful we could ever fit a fusion reactor in a vehicle the size of a car. We are now left with gasoline-powered cars, electric-cars, and walking, in this simplified example. Walking is too much of an inconvience to us, science doesn't have a problem with it, but human nature, and the time it would take, plus distance that would have to be traveled, make it impossible. On the economic/psycology/social side, walking isn't happening. So what will it be, electric or gasoline? The technology that's in place makes gasoline-powered vehicle cheaper then electric, and gasoline, even at the high prices that it is lately, is still an economical means of transport. Plus, we have human nature, gasoline is tried and true, electric isn't. Electric also has some problems with travelling long distance, and infrastructure doesn't support electric right now. Therefore gas is the best solution to our problem. In the future, if electric becomes more ideal then gasoline (enough to override our habit of sticking with what we know), we will switch.
So, we learn this. Each problem/solution pair depends on economics, human nature (psycological/social), science, and technology.
Lets apply this to Napster, OpenNap, Gnutella, and the rest of the field. Napster was nice and easy, a lot of us became accustomed to using it, and the technology (on our end) was cheap. However, Napster is either dead or moving towards a fee-based service. All of a sudden, from the economics viewpoint, Napster is less ideal. OpenNap is simular to Napster, there is the additionaly hassle of finding a server, but since Napster is having trouble, OpenNap seems a lot more attractive. However, OpenNap from the social viewpoint, is insecure, it has a central server, it can be attacked. Therefore, what do we have left? Gnutella is free of cost, and cannot be shut down through elimination of a central server. It is harder to use, and technology says it won't scale in the current format. Plus, it eats up bandwidth like a hog.
The above was a rant, and presented simplified examples. I didn't mention gyro-driven cars, monorails, carts hauled by penguins, or bicycles, amoung other things, because I was trying to keep the examples simple (and carts hauled by penguins aren't really practical). I didn't mention stuff like how critical user mass applies to file sharing systems because it didn't pertain to the topic of the comment. So please, don't flame me with a comment how widget-driven cars are the ideal solution, or that file sharing also depends on bandwidth. Nitpicking just wastes both of our time. On the other hand, valid comments are appreciated.
"The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no"."
That question bugged me so much, I will like to answer it for you, the answer is YES! I figured out a solution, after reading the paper yesterday, I spent my time in class scrawling and pondering over that, and I have a very simple elegant solution, I can't believe it! So, I am going to perform some experiments first before I make a fool of myself, but I certainly think it can be done. If I told you how, you do smack yourself in the forehead and say, "of course!"
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
..it can't scale. Anyone taking a computer science class should've realized that...
On my campus, we've been using Limewire to make a private Gnutella network. We use it to trade files with each other. That way we're not all trying to get the same files from the internet. It's much faster. People at other colleges should try it.
Also, this means that the population P DOES have an effect on the number of reachable users, because as P increases the number of redundant connections will decrease. Don't have the math to prove it, but I think that's the way it works.
Also, is there analysis of why gnutella can't scale in terms of P? I can see why it won't scale in terms of number of users I can reach, but why not in total users, IF users are content to let themselves be limited to a small fraction of the network (this should be enforced by the clients. I know people can wrte their own, but they shouldn't write them to allow huge TTLs).
Also, what of the reflectors?
Freenet is also very well architected, unlike bogus Gnutella. It's designed to scale up, so that popular stuff gets cached all over the place. Like, more people downloading means that your connections go FASTER. This is cool.
Cui peccare licet peccat minus. -- Ovid, Amores.
Is it possible?
Yes!
By using an internal microcredit/payment system (called mojo) and localized reputations Mojo Nation aims to do exactly that. Better connected brokers (peers) will naturally become more "server like" due to having a better uptime, lower latency and a lower mojo cost overall for other brokers (peers) to use.
The resources in the system are allocated dynamically. No strict heirarchy needs to be defined, it will establish itself appropriately for each individual peer as it is needed.
PS a new version (0.950) was released today.
The OpenNap servers are *very* good. I don't think I've used a Napster server for several months now. Grab gnapster and get this and you are good to go.
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
It seems to me that the principal bad assumption of gnutella was that forwarding search requests costs less than forwarding file lists. The second problem is the network topology, though that can be fixed relatively easily, and some of the newer client/servers seem to be tackling that problem.
If you switch to a more napster-like model where each user submits a file list, then freeloaders don't consume as much bandwidth. You develop a database over time as you stock up on file lists. The downside is that you can't just join and search (though maybe asking nearest neighbors to search could be part of the protocol). Since users might update only a few times per day or less, the overall bandwidth use isn't that high.
For the topology problem, I would suggest more of a ring-chain topology, with some redundancy (backup connections in case a link breaks, and multiple rings that are sparsely interconnected).
This is fun stuff to think about. Similar problems are present (self-organized networks) in "bottom-up" nanotechnology. Maybe I should ask for a DARPA or NSF grant for nanotech research and spend my time and money working on a p2p network...
Was that not the line in Jurassic Park about an enzyme prohibiting male offspring? Well, Gnutella may not scale well today, but a legion of MP3 loving programmers WILL find a way to share music. The proverbial cat is out of the bag and millions of consumers have tasted blood. The RIAA cannot put this Genie back in the bottle and only significantly lower prices for music with added goodies will bring buyers back. And it better be online, reliable and good.
They reported that CD stores around college campuses had GROWING sales.. But the sales weren't growing quite as fast as they were elsewhere.
This could be from any number of causes.
1. People at a college might have more straightjacketed finances and can't afford to increase their CD spending as fast as the general public.
2. People at a college might tend to order online more often, thus satisfying their music consumption through non-local stores.
3. People at a college may be joining CD clubs or may be purchasing CD's at home where they have convenient access to a large collection and bringing them to college instead of purchasing them near college.
4. A statistical anamoly. A decline in sales isn't actually happening.
5. A million other possible reasons.. Colleges are drugging their students so they purchase textbooks instead of CD's.
The conclusion: While such a correlation may exist: college cd purchases aren't increasing as fast as the average in the nation, that could have been generated by any NUMBER of possible causes.
If you want statistics I'll believe: Take universities who's student populations are similar demographics that do and don't have (say) napster, and ask them how many CD's they purchased in the last year. Or use some other technique that isn't susceptable for the flaws #1-5 above and give me numbers that don't have obvious artifacts.
Anyone who understands how Gnutella works (unfortunately, too few people) knows that Gnutella is horribly broken, will never work, and is basically unfixable.
The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".
However, the REAL question is whether you can have a peer-to-peer network with decentralized servers, i.e., with clients that automatically establish a heirarchy among all the clients, and certain clients become more "server like". They only way to make a Gnutella work is by making it heirarchical, but the heirarchy needs to be automatic for it have the same general "virtual network" aspect of Gnutella.
Is it possible? I don't know. You would probably have to have automatic bandwidth measurements, depth probes, all kinds of things to make it work. I simply don't know if it would be possible to automate something like that.
--
Sometimes it's best to just let stupid people be stupid.
Er; you did see who the author of the article was, right? Not exactly one of the record companies favorite people... Napster co-founder Jordan Ritter.
You're saying they paid him off, or did you just not bother to read the header?
No relation to Happy Monkey
IIRC, and I am not sure that I do, but isn't there some bug in the windows TCP/IP stack that you can't have too many "open" udp "connections" at once?
All of the communication is done through a single UDP socket. DTCP is a multiplexing transport protocol which operates over a single UDP connection.
You are correct about the number of open UDP sockets though. On any UNIX or NT variant the limit is usually 1024 to 2048 per process, and 64k per IP address (the PORT value in UDP or TCP is only 2 bytes)
This is why native UDP or TCP cannot support the required number of connections to perform direct queries to each peer in a large network.
Gnutella is neat, but for a reliable MP3-only service, check out Audiogalaxy.
At first I was put off by the web interface, but:
1) It remembers everything you request in a queue and will get it when available. (A must for dial-up users)
2) Auto-resume using temp files.
3) A small app in your system tray/console only sends/receive when you have it running.
The greatest advantage is that ZDnet/CNet/MSNBC and other DON'T mention audiogalaxy in their "quest for Napster clone" articles, so the quality of users, and therefore the music, is excellent.
Unfortunately, it is a centralized system, but so far, it seems the mainstream media/RIAA have ignored it.
I had thought of IDA as a secret sharing scheme like Shamir's. Thanks for bringing this to my attention!
I found the original paper:
MICHAEL O RABIN : Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance
Basically, it means you can break a file of length L into N chunks each of length L/M, such that only M chunks are needed to reconstruct the file. It's exactly the right thing for these circumstances.
--
Xenu loves you!
Go ahead. Strike me down all you wish. I have more karma than you could possibly ever imagine.
Heh. Grandpa indeed.
:)
MP3 first came out in 1996.
But it almost seems like forever doesn't it? To me it's encouraging that this stuff is so new because it means that in 4 years I'll be a "grandfather of the internet" too.
This is a truly fun time to be alive.
Yes, and Napigator works, too. The only thing is, when Napster's official servers close down, the OpenNap servers may experience their own "Napster flood" effect. I've already been unable to connect to some of the more popular opennap servers from time to time because of user limits.
--
Editor Emeritus and Senior Writer, TeleRead.org
I understand that there are basically three reasons for Freenet:
- Abolition of censorship
- Archival of documents based on their percieved "usefulness"
- Elimination of standard bottlenecks in most peer-only networks (I hate the term peer-to-peer, but won't digress into the rant behind that statement)
So, do we really care that Gnutella lasts any longer than the time that it takes to get Freenet everywhere?I love AudioGalaxy. It's a lot easier to use (in terms of searching), downloads auto-resume, and downloads automatically come from the fastest available connection nearby. You also tend to get far fewer truncated files because it will, by default, download the most popular version of the MP3 (in terms of size and bitrate), but you can also custom-select which version to download yourself.
It's definitely worth a try (and blocked by far fewer firewalls and ISPs than Napster!).
The trouble with news is one lost article screws your download. But that's what error correction is for! A simple Hamming code allows you to, say, break the file into 26 data shares and add 5 error-correcting shares such that the file can be reconstructed after one share is lost; you can do better with more sophisticated error correction schemes.
I haven't seen any P2P proposals which make use of error correction technology, and it does seem like it might be useful.
--
Xenu loves you!
FUD? Just read the math, man. Make your own decision, sure, but read the paper first. There's nothing FUD-like about the mathematics in the paper.
--jordan
Simple, with all that media franzy going on (Napster trial even got 1st page covering in my local newspaper) it's a big-scale advertisement for MP3. Yes Napster has a userbase of 60 Million so using the argument that it's only specific individuals that are doing it is wrong, but if that story made it in my local newspaper (and we could see a mention for gnutella too), guess how many people that didn't know about it or napster will be curious to try different services out.
Now there will be media coverage (other than internet) mentionning other alternatives like IRC, Gnutella, search engines, etc etc, this is really a stupid move... not counting the many people that is going to be pissed off at RIAA and stop buying CDs.
RIAA should have worked closely with napster to bring a decent buisness model instead of bashing on them, they might have actually profited from that. They've shown how many "copyright material" were leeched every second (around 10,000) but did they show EVIDENCE that their sales decreased DUE to napster? no, they didn't have to, but if they would have, things wouldn't be that way. You bet after napster shuts down, their sales will decrease, I, for a start, will not buy anymore CDs.
I hope a company picks on big artists for digital distribution and doing something like stephen king, a buck a download, money would go STRAIGHT to them and the record label would stop it's own piracy (i.e. ripping many artists off and taking the public for complete morons).
For now Gnutella will do for most people, and if people SHARE, maths or not, it will work, not as nicely as napster did, but there will be a bunchload of alternatives if gnutella isn't doing the job.
--- Metamoderating abusive downgraders since my 300th post.
i mean, each client is only passing a small amount of data between each, so i dont know if the agregation (sp) of the total bandwidth usage is a ... useful ... measurement...
tagline
... hi bingo
--jordan
I am currently working on a fully decentralized searching network. You can read more about it here.
The key aspects of this network will be:
- No forwarding. This is currently eating gnutella alive. A UDP based multiplexed transport protocol is used to maintain hundreds of thousands of direct connections to all the peers you want to communicate with. You can also tailor your peering groups precisely to what you desire, as far as quality, reliability, etc.
- Low Communication Overhead. All queries that are broadcast are performed with minimal overhead within UDP packets. A typical napster breadth query (10,000 peers) would take a few minutes on a modem, and seconds on a DSL line.
- Adaptive Configuration. Peers that have better or more responsive content will gravitate towards the top of your query list, thus, over time you will have a large collection of high quality peers which will greatly increase the chance of you finding what you need.
There are a number of other features, however too much to detail here.
Also, this is under heavy development, and not operational. I am going solo on this at the moment, and so progress is slow. However, once completed, it *should* be a scalable alternative to completely decentralized searching / location.
But if Napster gets squeezed, you can bet your last dollar that it will be made to. Or something like freenet or audiogalaxy will take over.
But if the price of gasoline goes up, you can bet your last dollar that teleportation will be made practical. Or that cars that use fusion will be developed.
Not everything is practical just because there is a need for it.
--
Sometimes it's best to just let stupid people be stupid.
What about the *pre-user* bandwidth? Even if you have Gigs worth of data to move, if you have millions of users and things are split up evenly, that's only kilos per user. The clincher is looking at peak bandwidth at any given node, and comparing that to capacity. Did I skim the paper too fast, or did it not address this rather thorny mathematical question? Not that I believe Gnutella scales smoothly at all.
On the contrary. Napigator is a nifty little freeware tool that lets the Napster client program use other Napster servers. The OpenNap network is huge and not going anywhere anytime soon...
...and will continue to improve if only folks would move to newer, more robust, and more compliant clients. If you're still running gnutella 0.53, or even Gn0tella, check out BearShare at http://www.bearshare.com/. You'll be surprised at how far Gnutella has come - that only hints at how far it may go in the future.
Critics said man would never set foot on the moon. Now critics are saying Gnutella is doomed. Funny, they've been saying that since March of last year and I'm still happily downloading MP3s. Ignore the critics and keep the faith.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
So, Jordan, you provide a nice demonstration of a flaw. It is considered polite in many circles, that when destroying someone's hard-work, that you make a peace offering in the form of some assistance.
:-)
Can we expect therefore to see an equally interesting and thorough discussion of how Napster/Gnutella can grow, evolve and perhaps merge, to provide the "ideal compromise" where we will not need 100Gb networks, but where:
a) The destruction of any significant %age of the network is transparrently ignored or healed.
b) The network will not segment as GnutellaNet can.
c) Bandwith requirements are low[er]
d) Anonymity of participants is maintained where required.
e) The law can't shut it down so easily.
f) Data can be secured, encrypted and/or signed (etc.) for specific users
And MY personal wish:
g) The end result is so globally accepted for file exchange and storage, that FTP dies a death, and we all live without buffer-overflow exploits for the rest of out lives
Note that Napster and Gnutella were very one-sided in their freedom with files. There was no facility available to ensure that the law wasn't honoured where desired.
--
Enjoy Y2K? Roll-on Year 2037!
This was a plea for development assistance ;)
I could very much use some additional C++ development talent to help with this project. Anyone who is interested please let me know.
Thanks...
When you talk solely about downloading mp3's, I've tried both Gnutella and Napigator. I've always found Napigator to be more stable, easier to use, and more likely to provide good downloads than Gnutella. Better yet, Napigator works with existing Napster clients to bring da music to da masses.
If its trading of MP3's at stake, I beleive that Napigator and nap servers like OpenNAP will save the movement, and not Gnutella.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
This included the fact that load on each server grows proportionally to the total number of servers, so the total CPU usage for the whole system grows quadratically. There are also serious issues with naming, searching, tagging, and other things that could have been dealt with.
There didn't seem to be much interest in this so I moved to lurking the Freenet mailing list, which seems to be a much more grownup way of doing the same thing.
Return Rant:
I'd say there is no way to put the genie back in the bottle, either by products dying out or by legal action. Now that there has been a taste there will eventually be one or more working models. None is likely to have the instant dominant position Napster had (except possibly Microsoft's offering if they bind it into Windows) but that doesn't mean the concept will die. File-sharing is a simple concept and a very addictive concept so it's something with low market entry and lots of possible market share. That will drive companies to invest. Us geeks will invest jour time just to keep the companies from sealing us in and because we like to hack code. I myself was working w/ file-sharing concepts long before Napster existed and am sure I will be long after. The concept has no doubt been growing ever since the invention of email. As a species we like being able to communicate freely. That includes text messages, voice messages, movies, photos, music, games, etc. Therefore there is no way the idea of sharing these things will die out. They'll just get thought about some more and new better concepts will be tried over and over until we find the perfect one. Email, ftp, gopher, web, instant messaging, Napster, etc are all steps we've taken.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
can I get prosecuted if they have my ip?
;-)
Yes. They can track your ISP, obtain a court order to search the ISP's logs, obtain your information, and arrest you.
is this likely to happen?
Short answer: no.
Long answer: Do you know how many people do this stuff? If the FBI went after every copyright violator in the nation, they woould need an incredible amount of manpower. IF you aren't reproducing and (important) selling bootlegs, nobody cares. You've been taking the "FBI Warning" at the beginning of videos way too seriously.
The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
Uh, right. Hands up everyone who actually needs to compile the latest, greatest kernel? Hands up everyone who did anyway?
Picture the following:
;-) If your filesystem needs space that's being used by the freenet cache, it can just go right ahead and grab disk sectors. (Anything you care enough to store permanently is in your filesystem.) Anything someone else cares about is in his file system.
A system much like freenet, with a few differences.
First, the only keys that the storage/communication mechanism cares about, are the MD5 checksums of the file in question.
Second, nodes do something like Seti@home, but with storage, not CPU cycles: ALL of the blocks on your disk that are not used by your filesystem are available to cache freenet files.
Fourth, files are split up into blocks, such that if a file spans blocks A, B, and C, if you get any two, you can reconstruct the third (IOW, RAID striping.) Each block is tagged with the MD5 of the whole file, and the sequence number of the block.
Third, the objects transferred are usually not whole files, e.g. if Alice asks for a Metallica tune, and sixty nodes out there have it, Alice randomly picks nodes to ask for particular blocks. This would tread far more lightly on each node out there, than Napster does now.
Tweltfh, If a node has idle time/storage/bandwidth, it can randomly receive blocks from other nodes.
Thirty-Seventh (yes, these paragraphs may be out of order.
Indexes are just files that match names to MD5's of files. There need not be any single scheme for indexing these files. There can be any number of names for a given file.
This addresses a few of the problems with Napster, like "D'oh! He logged off when I had all but five seconds of the song!", or "Man, I hate it when someone D/L's from me and I'm on a modem connection." If the typical hit on each node for a D/L is about 20K, bottlenecks go away.
A couple other random thoughts: It might be quite doable to implement this with UDP, if you make the blocks small enough. With this scheme, if I have any *part* of the file you want, I can help fulfill your request.
In Napster, I tended to look for people with the fastest net connections to D/L from. That's not really fair, is it? With the scheme I suggest, I'd ask for blocks from the fastest and the slowest nodes alike, and each of them would decide how helpful they wanted to be.
To put it in Star Trek terms, this idea makes every machine participating act as a pattern buffer for a transporter, as it were.
Comments?
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
*sigh*
omega_rob -- friend of the dread pirate Napster
Had an idea for a solution to file sharing over the internet without the vulnerability of a centralized static site serving as the database:
The client software would have preference settings allowing a user connecting to the fileshare system to indicate their "elegibility" to become a temporary Database Host. Options of Always, Ask, and No.
Client software would work like this:
Access specific IRC Channel and Query established hosts.
Hosts (the temp. Database Hosts) would respond stating who they are, and requesting the client's share list.
Database Hosts would negotiate which host would accept the new clients list.
Client would then be told which host to transmit list to, and when its next update would be expected
Search requests are then transmitted to the Hosts through IRC, results are returned directly to clients by Hosts. 1 to 1 transfers are then initiated using cilent's choice of protocol.
When clients contact hosts indicating they are still online, the Hosts will ask client program about Server eligability. Database Hosts will change to those who indicate a preferable host environment.
Of course there's specific things to work out, but what do you guys think? Use IRC as a central communications channel for everything, and use a randomized central group of systems as centrallized databases - faster search returns than gnutella can produce, but at the same time, the lack of an easily shutdown central server.
Just a thought. Don't have the skills or time to write up a trial client.
The only difference between a mom and pop and a corporation is mom and dad can't tell you what to do anymore.
Forgot this the first time around. Here are some tips to improve Gnutella's performance for yourself and for everyone.
1. Never connect to more than 5 hosts at a time. There's no need for it and you'll only hurt yourself by doing so. I used to spend a lot of time in the gnutella.wego.com discussion area, and then the GnutellaNews boards, helping out new users. Time after time someone would come in and say, "Gnutella is shit! I type in a search and I don't get results for 10 minutes!" Me: "How many connections do you have open?" Them: "50, and if I try with 100, it goes even slower!!"
The more active connections you have, the slower your Gnutella experience will be... And by being a congested node, you're adding latency to the network for everyone else. Set your max connections to 5. That gives me, on average, an overhead of 6-10K/sec in background chatter, not counting uploads/downloads.
If you're on dialup, max your connections out at 2 and (it hurts to say this) don't share files or you won't be able to do anything else online. If you really want to share - and that's a good thing - cap your uploads at 1. Leave routing up to the people with the fatter pipes.
2. Go for diversity in your connections. If you load up your client and see that you're connected to 5 RoadRunner nodes, dump a few of them and try to connect to other networks. Peer-to-peer file sharing relies a lot on peering, after all. Connecting across ISPs, networks, and even across countries is a good thing.
3. Don't share junk files. Please. Every time I search for Pink Floyd and get a ton of under-1MB MP3s in the results, I want to kill someone. Know which directories, if any, you're sharing... And clean them out from time to time. All those incomplete downloads you made are being sent out as search results, but nobody is going to download them from you. Those are a lot of wasted bytes coming through your query hits.
4. Perhaps most importantly, use a good client. See the parent for details.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
I was looking just yesterday at the gnutella protocol because I was planning on writing a client. I ended up sort of laughing at how horribly non-thought out the protocol is. The "protocol" as it's called, makes reference to how to handshake (it's really simple because it doesn't need to be complex), and how to get and receive files (http-style requests). That's it. That's the entire protocol. There is some vague mentioning of "passing messages onto people you're connected to" and "pinging", but as far as I could tell, there's no set way to do it. As a result, it's up to the client-writer to implement these things, and I sincerely doubt many of the client writers have had any in-depth advanced networking classes. Basically, as it stands right now, gnutella will never work, not with the protocol as it's currently implemented. The biggest killer is message-forwarding, if you can solve message-forwarding, you can have a scalable network. If your protocol includes message forwarding, you're dead in the water.
---
---
we stand in life at midnight, we are always on the threshold of a new dawn.
Lopster also features excellent integration with OpenNap, superior to Gnapster's in my opinion.
:wq
Tools like NewsShark and NewsGrabber make it easy to post or obtain binary formatted files such as multimedia and there is plenty of it available. No waiting for downloads, no acne-faced punk kids aborting them, and you can batch and resume at your convenience.
Usenet isn't that hard to use and there is a lot of music that can be found from your ISP's news server. Grab a client and check it out!
-Pat
Fudlike? Hmm... I disagree. While the information may be 100% accurate, the analysis might be perfect, the way it is presented is very FUDlike.
It's well known that Math and Stats can be used to prove just about anything. Don't just trust something because it looks scientific. To me, big scary numbers presented by a compeititor generally feel wrong, so I make sure to check the math carefully when I see things like that.
IMHO It's a long way from being decent.
Downloads are generally horribly slow. Generally most of my downloads on Napster/OpenNap servers come in around 25 - 100Kps. Audiogalaxy claims you get the fastest for your location, but I can't see how I'm getting 1-2.5Kps downloads if that's really true.
Selections not too bad, but you can't find the obscure stuff that you'll find on a network the size of Napster. It's organization is a step in the right direction, better than Napsters, but could be better.
What I really don't like about it is the fact that you have to choose which version each time. Sure, you're supposed to get the most popular version, but myself I don't like 128K mp3's. I prefer 192K files. So each time I download a song I have to choose which one I want. It would be nice if I could tell it I prefer 192K songs, and it would default to those.
With Napster I can find an entire album in a search and queue it up quickly. With AudioGalaxy it takes several clicks. You also don't have the ability to browse a users files. This is one of my favorite things about Napster is the ability to browse other users files. Sure Audio Galaxy gives you logical choices of other music, but I frequently find things I like that don't logically go with the song I initially searched for.
Basically I think AudioGalaxy is a good idea. I'd like to see a better client, maybe standalone or a Java client so it would have a little more flexibility, and I'd like to see more potential interaction between users.
Actually, UDP is almost universally used by game programmers because it is the only way to get around NAT (without central servers, which dont work well for gaming)
Also, no firewalls and proxies do not filter UDP by default (none that I have encountered), although you can configure them to do so.
Also, when data is placed on Freenet, it's split into pieces and distributed to several nodes making it even tougher.
Actually file splitting hasn't been implemented yet on Freenet. It probably will be by version 0.4.0 (current is 0.3.7) but not yet.
Perhaps you are thinking about how related files, such as file comprising a website on Freenet, get put on different nodes when they are inserted?
You'll hit one of two problems:
1) The peer above you in the hierarchy will turn their computer off, and you can't go anywhere. That is why it needs to be a dynamic sorta hierarchy.
2) You have set peers that'll always be there so #1 doesn't happen - but then you have dedicated machines, which takes away the dynamic nature of it.
You also highlight the really big scary numbers that come from doubling the default gnutella settings.
I appreciate the fact you sharing the equations and methodologies you used, and I'm in the process of looking over the math right now. If I had a nice math package to help me and if I weren't brain-dead after a long day of work it would be going a lot quicker.
Anyhow, I don't mean to offend. I'm expecting the math will be correct and the methodologies you used will be ok. The point I was trying to make is that the speed at which most people posted comments meant they had only skimmed the article too, and some people were saying "You can't argue with him, he's using Math!!" I really think people should check over the math for themselves before they agree with what you're saying.
A hobby of mine is poking holes in things too. Mostly TV commercials. I try to figure out the loopholes that let them say the things they say. "7 out of 8 math profs say your analysis is flawed" (I just haven't told you their names or what mental institution they're currently in).