Gnutella's Challenge
Gnutella News sent in an excerpt from a clip2 DSS report about gnutella's evolution and condition. "the network has neither smoothly scaled nor catastrophically collapsed since average traffic grew to regularly exceed dial-up modem bandwidth in August 2000. Instead, the network persists in a fragmented state comprised of numerous continuously evolving responsive segments, the largest of which typically contains hundreds of hosts. We estimate at present that unique Gnutella users per day number no less than 10,000 and may range as high as 30,000. We suggest that further technical innovation and wide adoption of this innovation are necessary for the Gnutella network to scale beyond its present state."" Read this if you're interested in p2p [?] .
Bandwidth Limited Connections. When two gnutella clients connect they should send in the reply the allocated bandwidth for that connection. The forwarding protocol should not allow more queries to be sent than the bandwidth allows. This would require a form of russian roulette on the packets--a method of killing queries.
It is feasable that the client should be able to forward post and response packets. Query packets are the most likely target for such filters. The filters could be implemented in several manners:
- The choosing is totally random.
- An artificialy intelligent option that learns the requests that have been handled and which have not would allow filtering of requests where the files are more readily available.
- An filtering option that may be unpopular with script kiddies (in fact, probably riled as sensorship), but popular with older or more mature individuals would perhaps place killfilters on obscene queries that enter your computer. After all, there is no reason that someone who doesn't agree with some actions must facilitate them with his/her own computer.
I would like to see the analysis of the different queries sent over the network, and some kind of user connectivity.Packet filtering would help to solve the current protocol limitation. Since the network is totally connected it would change the dynamics of the gnutella network and make it a more connected place even though there is a higher probability that the queries are never answered.
--
One downside to swarm delivery systems is that data is "published", simple sharing of a common filebase (a la Napster and Gnutella) is not possible. Someone has to upload the pieces to the system in the first place for them to be available because the system does not do the "let me take a look through your hard disk for things to give to others" kind of file sharing found in other P2P systems. jim
Gnutella, being a real P2P applicaation, will suffer from scalability problems that a server-based system like Napster can work around. If Napster gets too popular, they can always add fatter pipes and bigger servers. But Gnutella is bandwidth-constrained since there is no central server farm tracking all the users.
The exchanges in Napster themselves may in fact be peer-to-peer, but we need to remember that they have big honking servers arbitrating the connections.
Gnutella's design is terrific (and a great hack), but unless they can re-jigger things to knock the slow connections down in priority (or some comparable solution), they're doomed to be a victim of their own success. I guess the other possibility would be for a minimum bandwidth requirement for the software to enforce. Perhaps some enterprising person will write a Gnutella that only allows, say, 144 Kbps and up connections on the network.
It would be interesting, though cruel, to relegate all the dialup people to second-class citizen status, but it would allow Gnutella to scale a lot past the existing limits.
- -Josh Turiel
-- Josh Turiel
"2. Do not eat iPod Shuffle."
then why is it that I'm always on the same network fragment as that idiot spammer who returns your search request with an html extension but containing a stupid advertizement?
If tits were wings it'd be flying around.
GNUtella may be an interesting idea, but it's nothing more than a hack. Splitting into subnetworks is both infeasible and undesirable. First, you really can't compare it to IRC. IRC is highly centralized, whereas everything about GNUtella is distributed. IRC can, and does, scale for many thousands of users effectively; GNUtella does not (it responds like crap with any significant number of users) Secondly, you're thinking of the term "network" too rigidly. There is no network admin, no physical location, no centralization. In short, it's a ragtag and volatile collection of different IP addresses. There isn't a way to rigidly enforce the number of users in GNUtella, so how does one keep the networks divided into neat little units. This also means that it's hard to return to a specific network amongst a number of others. Where might your hotlist users be? Where do you find those with like interests when everything is constantly tossing and turning? Finally, and most importantly, you underestimate the importance of size. When the network can only effectively scale to ~5k users (probably a stretch), and when only one in 10 of those users has broadband that can support a decent number of speedy transfers (especially important when users tend to sign off and on while you're downloading), and when only one in 10 of those users has a sizable collection being shared (seems like most users have the same top pop garbage that everyone else has), you're ultimately reduced to, say, 50 users that you'd actually want to search from. I don't know about you, but 50 users isn't nearly enough. Now you might argue that i'm pulling these numbers out of my ass (and you'd be mostly right), but if you look at the empirical results, it's not far off the mark.
In my opinion, the only thing that something as trivial as GNUtella is good for, ironically, is the IRC types. Who could form psuedo-private loose knit "networks" from which they can share warez/mp3/porn with their "friends" without the need for a dependable server (i.e., just join the channel find an IP and connect to it)
Have clients keep some keywords about the user. It could be a user-written paragraph, the names of shared files, recent search requests, etc. Clients would also have a "horizon" H: clients within H hops are considered "local". Clients can query other local clients for their keywords, and determine how similar those keywords are to their own (maybe a percent).
Define a "crawl" to be dropping one (low-keyword-match) direct connection and forming a new direct connection to a local node. You might decrease search response times by crawling repeatedly toward higher keyword-matching nodes.
Imagine a "speed" setting, measured in crawls per minute. There could also be a "randomness" setting, to misrepresent percent-keyword-match by a random amount for each local node. These settings could decrease over time, so you gradually lock in to a suitable local community without getting caught at the nearest local maximum. This idea is borrowed from simulated annealing, which someone else here probably understands better than I.
Is it possible to integrate such clients into the existing network, through search and search-response packets with a ttl of H?
Your horizon defines a neighborhood of local nodes. Their shared files will likely be of interest to you, so your client software might list them. In addition, their _ideas_ might be of interest, so your client software could show you their keywords, and allow instant messaging. There could even be a local neighborhood chat, ignoring chat packets with (hops > H), and sending packets with (ttl = H).
Usage scenario: I heard a band on the radio; sounded kinda like some other bands A B and C; and the lyrics had something to do with X, although I don't think they used that word. I make sure to put A B and C in my keywords, push up the speed and randomness sliders, and wait for them to settle down. Then I start asking in the chat if anyone knows about .... Maybe someone helps me out, and puts up a sample mp3. I might even ask if there are other bands like that.
Current Napster/Gnutella/whatever software lets you find songs you've heard of by bands you've heard of. Gnutella neighborhoods could let you find music you've never heard of.
So; here's the rub: What's the best way to get people to buy into this? With snow just setting in here in Buffalo I have a lot of coding time; what's the best codebase to start from? Who should I convince? (and of course, what am I missing and how could this idea be made better?)
Thanks for reading this whole long thang.
Chris
This is a bandwagon that just won't roll very far, and the reason - as usual - is obvious to people who've studied the field for a while. Naively implemented, a P2P protocol tends to generate O(n^2) messages for a given workload, where N is the number of nodes. This can often be brought down to O(n) but only with absolutely top-notch developers and a lot of effort. Better than O(n) is usually impossible.
By contrast, hierarchical systems tend to hover between O(n) and O(log(n)) depending on the particular problem. This does not necessarily apply only to single-rooted hierarchies, either. A multi-rooted hierarchy tends to exhibit the same scaling behavior, though of course the more roots you have the more you start to look like P2P and share its scaling characteristics.
The long and the short of it is that P2P just doesn't scale well. Even the best-implemented P2P protocol can merely approach the message efficiency of a naively implemented hierarchical protocol. For large numbers of nodes this results in the P2P implementation simply getting swamped. The only question is how large and how swamped it has to be before it becomes unusable.
Slashdot - News for Herds. Stuff that Splatters.
I've ported gtk-gnutella to darwin. Here is the link.
Darwin Gnutella
Regards,
proclus
all you do is have a option under preferences ala napster that says your bandwidth type ie. 56k/cable/dsl/t1/t3/etc then whenever you connect to another T1 have it make a strong connection between the two of you. Its network matchmaking. To prevent people from specifying a lower bandwidth than you have, have the program limit your download bandwith to that specific speed which should keep people honest while helping to better organize the network.
http://www.livejournal.com/users/cixel
> sleazy (gnet2.ath.cx has the exact same TLD as another website whose URL contains the word "goat";
cx is a country TLD. Why should you call the whole of Christmas Island sleasy because of one goat who lives there?
My Karma: ran over your Dogma
StrawberryFrog
Actually, MojoNation does something very similar to what you propose.. its still a beta product, and it's still growing, but it looks good so far:
... 56k clients could connect and ask the "net" of super nodes for the queries on content..
* Automatic mirroring nodes
Mojonation block-servers remember what blocks seem to be popular (most requested), and if they dont have them, they may go grab a copy to mirror locally.
Nodes would automatically mirror data from local (fast) mirrors, so that it's more accessible.
See above. Data that is popular is automatically mirrored. When data is published to the network, dual-redundancy is used to avoid losing the data if some blocks turn up missing. Think RAID. Well, no, not exactly, but it is somewhat redundant.
It's called a content tracker, and anyone can run one on Mojonation. There are two central "master publication trackers" (MPT's) that keep lists of all publication servers, and the clients retrieve this list initially from them. There are possible plans to distribute the MPT's as well.
Content Security
All of the content posted to the network would have meta-moderation on it; anyone can classify data, and mark it as such.
There is currently no 'rating system' in mojonation, but it is something being looked at, barring the technical hurdles in doing so.
Privacy
If possible, I'd like to see users IP addresses hidden; only have a unique login name/password setup for security; but this may make hackers/spammers hard to track and ban, but hopefully the meta-moderation would filter out most of it.
I'm not sure if Mojonation is going to go this route eventually, but if ya use TCP/ip, you can be traced eventually anyway. UDP is unreliable.. As for data privacy, Mojonation actually chops a file up into small blocks, then encrypts those blocks, and distributes them randomly. Then it send the description and block locations to the master server. In essence, nobody knows whats in each individual block on their server (if they run a storage server); everything is encrypted. I am breezing past all the details here, feel free to read more about it if you wish.
Volunteers
Anybody?
http://sourceforge.net/projects/mojo nat ion/
(Shameless plug): ProcessTree - Put your idletime to use.
What good is all that... well, a host could make decisions about which queries to route and which to discard based on any information about the reputation of the originator. Hosts would allow faster sends to downloaders with good reputation. Abusive hosts (Spammers, DoS attacks, etc) would ruin their reputation quickly (or keep recreating new keys all with no reputation).
Reputation in such a system would be very valuable. Somewhat like slashdot karma, it would appeal to many individuals, who would likely go out of their way to gain reputation signatures, perhaps by providing or mirroring lots of high quality files, attaching good meta-data descriptions to files, etc. The client software would need to have ways for everyone to do moderation on files and users... but unlike slashdot, there would be no universal score, only lots of keys/reputation scores, signed by other users. The software could also automatically detect certain behaviors (files available for download, on-line for long times) of other hosts, and issue reputation points. The idea is that a reputation score is to have a way to allocate the available resources (mainly bandwidth), to establish an incentive for users to share files and act in ways that benefit the network, and of course to make the network resiliant to abuse.
Now, for a system like this to scale, each host will need a LOT of disk space, to store a giant database of keys and signatures on them, and it would ultimately act like a giant cache. Each host would obviously collect the most positive signatures... the initial communication would be similar to boasting, the requester would send several of the best moderation signatures, hoping that the remote host already knows those people who signed and will therefore offer faster transfers, propagate a query farther, etc.
Maybe this ultimately works out to be the same as digital cash in MojoNation. I believe it is a different idea, in that it's based on an assumption of abundance.... everybody can win. You can get a great reputation without someone else giving up anything. In a cash system, when you get cash (mojo), someone else gives it up, and the overall philosophy is of scarcity.
If you have any ideas or thoughts to add to this, please post. Am I totally out in left field here, or does this seem like a reasonable idea?
PJRC: Electronic Projects, 8051 Microcontroller Tools
It doesn't have to cover the entire Internet. The fact that you can simply specify a server to contact makes the solution so obvious that I can't believe people are still whining.
Let Gnutella split into multiple networks. It worked for IRC, it will work here, and it will work for similar problems in the future. Any problem that doesn't lend itself well to subdivision is probably badly specified. Don't forget that the Internet is a network of networks, and it works well for a reason.
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
Here is my 'proposed solution' - as everyone else has one, I thought I'd toss this idea out. Why not extend the Gnutella protocol to include a method to subdivide the existing network. Meaning - instead of randomly collecting other nodes of any type - why not only connect nodes of a certain type say "Warez" or "MP3Z". Now if I have 1.2.3.4-MP3 and I choose to connect to the "MP3" subnet of gnutella I will.
Clients Can query the larger 'unsegmented' net to determine the 'subnetted' network extensions:
5.6.7.8:warez;
9.0.1.2:pron;
3.4.5.6:warez;
ect.
This could probably be implemented without breaking the existing clients and network where only Gnutella 'v2' clients would be able to choose a subnet to join. When the "MP3Z" network grows to the breakpoint - someone starts a MP3ZZ network.
As a side note: Has an organization or project formed on any collective level to address these problems? Is there a 'recognized' authority that is guiding the 'official' Gnutella protocol and a reference implementation? Gnutella is a very necessary model to pursue and develop because of the threat to Napster (though OpenNap provides a mechanism to thwart the $RIAA$MPAA$ whores - there is still the problem of having 'servers' to identify and attack (not to mention the problem that Napigator will have when Napster is finally shut down...))
I've been reading tidbits around the net, and I'd like to ask what people think about this:
a rd_of.mp3)
Automatic mirroring nodes
Nodes would automatically mirror data from local (fast) mirrors, so that it's more accessible. It would need to "learn" what files are requested, and then mirror them. What would stop the script kiddies from "rating" the content they want up, so it would be mirrored more often?
Structure
If all of the clients are required to keep a copy of the "whole database", it is not feasible without everyone on the network having a T3+, or later OC3+ connection. But as with the data, the nodes keep track of other nodes, but only if the bandwidth permits. 56k clients could connect and ask the "net" of super nodes for the queries on content. No one node should be in control; but many based on the same rule set. You would have to have a setting on the client for "perm super-node", or just "56k browser". Even the 56k browser could contribute to the network however; two 56k modems that are on the same segment of 'net can transmit with very low latency; they can buffer queries from the super nodes, and allow for faster access.
Content Security
All of the content posted to the network would have meta-moderation on it; anyone can classify data, and mark it as such. People can also rate classifcations; so to prevent some spam. If a file with the same name shows up on the 'net, it could end up with the same rating. (my_garage_band_called_nirvana_that_nobody_has_he
I'm sure that folks have a complex yet effect methods of rating. (flame wars may ensue) but I'd be really interested in hearing ideas.
Privacy
If possible, I'd like to see users IP addresses hidden; only have a unique login name/password setup for security; but this may make hackers/spammers hard to track and ban, but hopefully the meta-moderation would filter out most of it.
Volunteers
Anybody?
-Eric Johanson - ericj.spambad@cubesearch.com
This sig for rent
Hundreds of monkeys, eating bananas, swingin' away on ropes, with their brains hooked together with wireless broadband technology, all for the purpose of file sharing!
And before anyone gets a change to say it...
There... it's been said.
Thank-you, and good day!
Hi! This is the Sig, blatantly attached to the end of this comment.
One of the reasons that Gnutella flounders is the Interface. The creators had a novell idea of creating a node-based network for file sharing, but their interface needs some work.
Napster, and then Scour, both simplified their application so any nitwit (even some mac users using macster) could gain access to the resources.
A better interface as well as some way to have the top hosts from gnutellahosts.com automatically be used everytime the application is loaded up is definately a must.
What the developers need to do is try a rapid prototyping model . As much as I hated it while doing that damn internship, it really does work. People need to be surveyed on how the application should work. The only way to come up with a good product which cathes the broadest audience is to get feedback from that audience.
Thats enough bs from me.
If you pointed the gun at someone and found out it was your clone pointing a gun at you, think of what you would think.