New Peer-to-Peer Designs
We've received a lot of peer-to-peer submissions, including the one that follows and this one. Perhaps people will post links to those systems which they think have a decent chance of solving the known problems of p2p networks? PureFiction writes "Given the recent ruling against Napster and the various happenings at the Orielly P2P conference this is a good time to mention a new type of decentralized searching network that can scale linearly and adapt to network conditions. This network is called the ALPINE Network and might be a future alternative to searching networks like Napster and Gnutella while remaining completely decentralized and responsive. You can find an overview of the system as well as a Frequently Asked Questions documents on the site."
We meet again, ;)
Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network
Funny, I thought web servers acted this way...
Even at 60 bytes per packet, if you're
trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe.
If you find the reply your looking for, then there is no need to query the remaining peers. Also, you will not clog the incoming pipe, i've covered this quite a bit, you control how many queries you send out and when, and also to which peers they are sent. The adaptive nature of the protocol ensures that successive queries will be more likely to find what they are looking for sooner.
You would only query 10,000 in a worst case scenario.
The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims.
No, each of these 'victims' would only receive a single 60 byte packet. This is the opposite of a DoS attack, as you are sending a large number of packets, but each peer is only receiving one of them.
Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied
to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature.
I understand what your getting at, but your missing the main purpose of this network. If you need to search a large number of peers for dynamic content in real time, you need to reach all of them to do it. Whether you do this using a tree/routing/forward approach, or a single peer using multiple unicast packets, you have to reach them to do it.
The design of this network is so that the resources you use are your own and that you can tailor the bandwidth, peers, and effectiveness of the search to your own preferences.
This is a highly specific network architecture with a very specific purpose using very small packets. This is why alpine can bend the common conceptions about scalability and performance and still remain efficient and scalable.
I came accross this amazing P2P system the other day that completely blew my mind. It scales well and can handle any kind of file type. It has mature clients for all major platforms including Linux/Solaris/IRIX/SCO/AIX/BSD/Windows even Amiga. It's so powerful it even includes a meta-search tool for searching for P2P servers.
It's called Archie and the meta-search tool is called Veronica. You should try it out it's amazing.
The Information Revolution will be fought on the command line.
While it is probably not very important to the people reading this, there be dragons ahead for this project that I do not think the implementor is aware of. We implemented a system very much like this for Mojo Nation to achieve the swarm distribution (parallel downloads) which is one of the key features of our technology. Windows does not like to hold lots of open connections and you quickly eat up local resources and run out of file descriptors. It works like a charm under Linux and other "real" operating systems, but backporting this to make it available to the un-enlightened will be a very, very unpleasant task for whomever tries to actually implement this. jim
and it will do so because this community just will NOT take no for an answer....there's too many bright minds out there. I'm personally interested in the guys over at www.musiccity.com in league with napigator.
The main problem legally with napster is that there is a central server. That problem is being solved by having multiple and/or moving servers. This makes it much harder the levy a lawsuit against anyone.
We all know napster works, but it's illegal (or will be soon). Warez is illegal, but it will never go away because you just can't prosecute.
There's nothing Intelligent about Intelligent Design.
Increasingly, in the era of second-gen content distribution networks, they don't. Where they do, they pay dearly for the privilege of sucking up so much bandwidth. I don't think you do yourself any favors by pushing a first-gen "solution" when the second gen is already out there and some people - such as myself - are already working on gen three.
You won't get the answer until you've already sent queries to the next batch. Net result: not only are you consuming all this bandwidth and creating all this congestion, then you turn around and drop those packets on the floor. That's just adding insult to injury, as far as your upstream is concerned.
Please describe how this adaptation occurs. The details are not on your website, it's a complex problem, and I think you're just handwaving about something you don't understand.
But the intervening routers are receiving them - and the replies - in huge clumps. That's just like a DDoS.
Slashdot - News for Herds. Stuff that Splatters.
What about latency? I dont want to wait 2 minutes for a reply!
Get a DSL Line! ;) Also, this is assuming you query the *entire* group. Part of the purpose of the ALPINE protocol is to adapt to the repsonse you receive during queries. The first query you
make may take 2.5 seconds. The next you may query the responsive peers first, and you may find what you are looking for in 1 second. The next query may be further refined and your peers are organized so that you find what your looking for in a fraction of a second.
You can only do this type of adaptive configuration tailored to *each* peer and their use of the network if you allow them to do the quering themselves, and order the queries themselves. This implies a direct connections to the people they are quering.
You cannot perform this type of custom adative configuration without an extreme amount of overhead in a routed architecture, thus the need for DTCP.
I do not know about you, but an awful lot of users out there do not have high speed access yet. And I can think of many folks whose first action would be to search everything.
Remember, half the population is below average.
"It is a greater offense to steal men's labor, than their clothes"
This looks really cool, however I forsee a lot of problems with users that don't have a direct internet connection. Namely, you cannot transmit a UDP packet to someone behind a proxy/firewall/NAT unless they have sent a packet out to you first. Still, they do mention NAT in the overview, so at least they are thinking of this.
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
Isn't "p2p" the same as "client/server" in the special case where client==server? So, for instance, HTTP is P2P if I'm running netscape and apache and so are you and we connection to each other? Or does it only count as P2P if it's a single piece of sofware? If so, then I'm announcing Mozpache, a web browser AND server.
"But how do you search," I hear you cry. How do you search NOW? Google, right? Same deal here, just use DynDNS (or whatever) to get the link to stay stable.
"P2P," sheesh--it's amazing what some people think is amazing.
--
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
This system does not fully eliminate the Gnutella problem of having too many search queries on the network. With Gnutella your queries will be propagated from your node to all the nodes you are connected to and then to all the nodes that your neighbours are connected to, which creates search clashes (same node searched gets the same query from neighbours over and over.) With Alpine the overlap is eliminated but the point is, you still will have to search every node every time you want to find something. I do not see Alpine as a huge step forward in terms of scalability, what they achieved is basically elimination of repeated search queries but not the real problem - sending as many queries as there are users. I am not sure whether they will eliminate Ping Pong, I don't think so.
It is necessary to rewise the entire searching stragegy, not simply linearly reduce the number of queries.
You can't handle the truth.
What is an appropriate sized batch? 200 queries at once? 100? Seems like searches will take forever by stalling a query.
Sure, there are various criteria that indicate a bad or good peer. These include, among other things:
Wow, this seems like a lot of information to keep track of on the client side. Not only am I keeping track of every IP-node user out there, but I have to keeep track of it over time. In a napster-success scenario, I'd have 2 million entries to keep track of. Not only that, but it seems like a lot of wasted overhead? Even if a user doesn't have what I want, I have to compute statistics into his/her record each time.
Um...look I'm just one user. Any searching done by me, yes, is only one person's activity. But I'm logging into a group of 10,000 active users? The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests. That's like saying there'll be less pee in the world if we all just pee'd slower.(yes, the only analogy I could think of. I'll brb, i gotta go P)
Rader
I have to admit that it's a little bit strange posting something with such a subject line from the conference hall at the O'Reilly P2P conference in SF, but I can't help myself.
Implementing a pseudo-broadcast by sending separately to all destinations is stupid. Real network designers have known this for years. First off, to send to N destinations you have to shove N packets down your local pipe, which may be narrow. Even at 60 bytes per packet, if you're trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe. That single UDP socket you're using does have a finite queue depth, so it will start dropping replies left and right after the first few. Well, maybe not, but only because your ISP's routers will have dropped them first because they overflowed their own queue depths.
Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network. The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims. In the same Gnutella thread in which you started hyping ALPINE, some slightly clueful people were suggesting tree-based approaches. Those ideas, as stated e.g. by Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature. The pitfalls in what you're suggesting are already so well known that they should be part of any computer-networking curriculum, and much more reasonable solutions to the same problems are only scarcely less well known. There is no need to reinvent the wheel, especially if your wheel is square.
As Clay Shirky mentioned in his talk here yesterday, "peer to peer" can be considered a little bit of a misnomer. It's a lot more about addressing and identity issues, and even more about scalability, and having N^2 connections in a network of N nodes is no route to scalability. ALPINE's scaling characteristics will be worse than Gnutella's. Pemdas made a good point that you seem to have a talent for marketing. Stick to it. Unlike Pemdas I can evaluate the technical merits of what you're proposing, and you are headed 180 degrees away from a solution.
Slashdot - News for Herds. Stuff that Splatters.
This is absolutely correct. I talked about this in the Gnutella scalability thread yesterday. Even if you ignore the overhead of your "backbone", the process of even trying to send every query to every client is fundamentally broken. If you want to support people on less than 100bT dorm networks, this is not going to scale.
Just figure out how big a query is, then figure out how many queries per second have to be in the network before all of the client's bandwidth is consumed. If you estimate a query packet to be 1000 bits, your modem users max out at 56 queries per second on the network. And that's an absolute best case which will never ever be acheived in practice.
Until this problem is addressed, these networks will never scale. You have to have some hierarchy of high-bandwidth servers which get the queries and low-bandwidth clients which don't. This can still be a truly distributed network, but you have to distinguish between the machines that have the resources to handle lots and lots of queries and those that don't.
Imagine a two-level network where you have a Gnutella-style network of OpenNap servers which the napster-style clients connect to. The servers distribute the queries amongst themselves to perform the searches. Each server knows what files it's clients are sharing Napster-style, and can answer for them. With this architechture, the well-connected hosts on cable networks and dorm subnets do the heavy lifting of the searches while the dial-up clients get good performance because they aren't being clogged with a bunch of queries. The network scales better because you aren't trying to do lots of work on really slow links. Your network is also more stable because you don't have the clients (which come and go quickly) changing the topology of your "backbone".