Hypernets -- Good (G)news for Gnutella
Red Roo writes: "This online article addresses the recent
criticism of Gnutella network scalability by pointing out that it is a Cayley tree. As a viable candidate for massively scalable P2P bandwidth, all trees are dead! But by going to higher dimensional virtual networks (aka "hypernets") e.g., hypercubes or hypertori, near linear scalability can be achieved for P2P populations on the order of several million peers each with only 20 open connections. This concept seems to have been entirely overlooked by critics and developers alike."
I had a four dimensional hypernet using a router from Heinlein Technologies installed in our off-site server room and it caused no end of grief.
Firstly we had a continuing problem with dropped packets but things started really screwing up when our time domain packet switching starting picking up packets that HAD NOT BEEN SENT YET!
The collision rate went up to an astounding 53% by the time I was able to pull the plug. In short It sucked big time!
By the way - don't waste your time buying any of those California lotto tickets this week because just before I downed the thing I surfed the net....
For more information on the structure try Relation Nets and Hypernets in pdf form.
The author, Neil Gunther is semi-famous
in the Unix community for giving very
good talks and courses on performance.
Check out his open source performance analysis
softwar PDQ (Pretty Damn Quick)
http://www.perfdynamics.com/Tools/PDQ.html
Despite what one of the other posters has said about "binary hypercubes" being nonsensical, they are simply a way of describing the nodes of a hypercube with a binary address. Each node in a 1024 node network has a ten bit address. Each one has to connect only to ten other nodes: the ten nodes which are specified by flipping only ONE bit on that nodes address. This provides multiple redundant paths by which a message can be routed in the case of the failure or congestion of a node.
And by intermittently polling the other nodes to which it connects, it can keep a routing table which optimizes the paths to be used. The major difference is that in Hillis' Thinking Machines the number of nodes was capped, in fact it was a fixed number of nodes. Now implementing a dynamic network in a hypercude or hypertoroid topology brings on a new set of problems such as dynamically re-allocating the hypercude node addresses as users fall off and climb back on to the network. This can be more of a bear than people realize.
Correct me if I'm wrong here, but, the article seems to be saying that packet switching is more effiecient than old style circuit switching (?hierarchal switcing?). It says that bouncing stuff around nodes connected to a bunch of other nodes and letting the stuff find a path to its destination is more effiecient and scalable than any kind of tree structure where stuff goes down to the trunk and back up to a different branch to reach its destination.
Unfortunately with the way things are set up right now, I think our beloved internet is set up like a toroid instead of a cube. You have a backbone as the middle loop and then coming off of it are rings that are local that provide service to local ISPs. Then they sell to thier end users. In the end I picture a fuzzy torroid. And according to the article, those are more scalable than trees, but not as much as the cube. However the article says that they are harder to implement than the cubes, not so, as they seem to have evolved in the marketplace naturally, and setting up a cube like network in the real world is harder.
But they're talking about this applied to software, and virtual networks, not real world hardware. However, seeing as how the real world has moved from a tree based telecom system, to the torroid sceme of the current system, it would be interesting to see what happens when the torroidal system in the real world runs into scalability problems and goes for the cube.
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla
It should be noted that in it's evolution since the Nullsoft Gnutella 0.56 server/client, modern Gnutella servents have reduced traffic and improved network scalability by means of
- routing pushes instead of broadcasting them
- caching pings/pongs, and even queryhits
- use of UltraPeer/leaf relationship, which increases the speed at which traffic is routed
There are other ideas that Gnutella developers like those at Limewire have been kicking around, which are similar to ideas that publishing networks like Freenet and MojoNation have, such as data specialization (ie. queries are directed to those likely to have the data, not broadcast to the entire world).
I'm glad whenever mathematicians or people with specialities like traffic analysis examine existing p2p systems, or give their ideas on p2p systems - they might come up with some good ideas or give a good critique that clarifies elements of a p2p network. This paper is certainly less arrogant than ones with names like "Why Gnutella Can't Scale. No Really". A hypernet is an interesting idea, although I can think of a number of reasons why current p2p sharing networks would not implement them. Namely, because authoritarian networks like Napster were shut down by trade associations like the MPAA/RIAA, while more anarchic networks like Gnutella are more immune from such actions - we must consider not only the survival of the scaling network due to technical constraints like Dr. Gunther does, but also it's survival due to legal constraints orchestrated by large corporations. Then there's the question of how many peers the network is designed for - scalability is just one factor in the reasons why I would use a particular p2p client. Luckily, we will have competition between p2p networks like FastTrack, Gnutella, Freenet and MNet (Mojonation), and perhaps different ones will be used for different purposes, just like Usenet, distributed.net and so forth.
Binary Hypercube
If voting were effective, it would be illegal by now.
I'm not one for saying "it'll never happen" because sometimes if you just sit there that's exactly what happens.
.ogg files no more.
But I'm not convinvced of this particular threat.
It would require worldwide cooperation and at every level of computing. It would be difficult to draft an international law AND define what a computer was. Does my digital watch need DRM?
To get this law in one country (probably the US) is going to have to implement it unilaterally. Chaos will ensue. I think it's just too much hassle for a government to embark on.
So mark my words, and then punch me with them when you can't play your
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Even though you might have been trying to be a little facetious, I think your point is very valid.
I can grab files, with no small amount of effort, from an online file sharing service, and maybe get 2 GB a day. If I network, in person, with people who share similar tastes in music, I can get a lot more bang for the buck. As soon as we see larger portable technologies (It's already happening), trading media will be just like trading games in user groups was back in the 80s and early 90s. We all just bring our 300GB portable disks to meetings, link them all up, and take them home to copy onto our TB home systems.
The only thing that would prevent this from happening is a very rapid growth of broadband, to something like reliable 10Mbit levels. I don't see that happening before hard disk space grows to the sizes I quoted above.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
You don't pay enough to suck down all your bandwidth 24/7. If you were led to believe that was your right when you bought your broadband, then I'm sorry you were misled, but the only people who are sold lines that they are allowed to max out 24/7 are the people that pay for real Internet access, i.e. a T1 from a first tier provider. Otherwise, you have to share.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Dont know what a hypercube is? click here
A form of this organizational structure is already being implemented by Lime wire. Here is an excerpt from thier FAQ:
we've created a new Gnutella hierarchy, whereby high performance machines become 'Ultrapeers'. These machines accept connections from many LimeWire clients while also connecting to the rest of the Gnutella network. Moreover, the Ultrapeer shields these 'regular' LimeWire clients from the CPU and bandwidth requirements associated with Gnutella, directing traffic to clients in an efficient manner. The reason you see only one connection in your connections tab is because you are a LimeWire client connected to an Ultrapeer. Unfortunately, not all Ultrapeers are as good as others. If you find that you aren't getting many search results with the Ultrapeer you are connected to, simply disconnect and connect. You'll probably connect to a different Ultrapeer, which is more 'connected'. Also, as time goes on and the network grows, you'll receive more results. Moreover, we are currently working hard to ensure that any Ultrapeer you connect to will be well connected - stay tuned to future versions of LimeWire.
My success with the new structure is mixed. Downloads and searches seem to work almost as well as before, but I'm getting considerably fewer uploads, which must mean that, someone, somewhere, is getting screwed. Limewire itself is not a bad little product, it's main claim to fame, of course, is that it runs well on both Mac OS X and Linux.
This is because it assumes the peers are already arranged in the network in the topology one wants.
If a central addressing authority exists, it is no problem to simply give new peers addresses and the addresses of their neighbors in such a way that the network acquires any topolgy the authority wants. The authority can even cope with peers leaving the network more or less arbitrarily.
However, a real question is -- how do you get peers to "self-assemble" into the desired topology in such a way that a small population of peers that choose not to play by the generally accepted rules cannot dramatically effect the outcome. In other words, how can peers be persuaded to place themselves on the points of a cubic hyperlattice solely by contacting a few already installed peers, some of which may not be telling the truth?
A hypernet is an interesting idea, although I can think of a number of reasons why current p2p sharing networks would not implement them. Namely, because authoritarian networks like Napster were shut down by trade associations like the MPAA/RIAA, while more anarchic networks like Gnutella are more immune from such actions - we must consider not only the survival of the scaling network due to technical constraints like Dr. Gunther does, but also it's survival due to legal constraints orchestrated by large corporations.
I think your concern here is exactly backwards. Specifically, higher dimensional topology would decrease the need for central "UltraPeer"s (also known as lawyer bait) and thus make the network harder to shut down. If the trend towards depending more on some "peers" than others continues to the natural limit, you wind up right back at Napster (one UberUltraPeer to rule them all, and in the darkness...get eaten by a grue if it's lucky, sued by the RIAA if it's not).
On the other hand, if the topology is made more scalable, the targets won't be as tempting at any given network size, and the whole thing would be harder to take down by force. If all nodes are equal, cutting one will likely create enough publicity to attract seven more to take its place.
-- MarkusQ
A Cayley tree is a tree (network with no cycles, a cycle being a set of connections a path you can take in a circle to get back to the same node) in which every node, except those right at the edges, has the same number of connections. As a tree, if you cut any connection, you're seperating the network into two unconnected networks (or isolating one node). Noting that it's a Cayley tree is pointing out that as it grows, nodes at the edges have more and more connections between them (all clients connecting to one server would be a tree, too, but the number of connections to the server would keep on growing, which means it wouldn't be a Cayley tree).
A hypernet is like a grid: imagine the nodes like the places where the lines cross on graph paper, so each node (except at the edges) is connected to 4 others, in a regular, predictable pattern with lots of cycles. Now imagine a 3d grid, like a lot of stacked sheets of graph paper with each node connected not only North, East, South, and West, but also up and down. Each connected to 6 others, in a regular, predictable pattern with lots of cycles. That's as far as you can go with physical models, but in a freely-connecting network like the internet, you can keep going to 8 connections per node (a 4-dimensional hypercube network), 10 connections (5 dimensions), and so on.
That explanation was for a hypercube, a hypertorus would be like going from a bunch of connections around a circle, to a regular set of connections over the surface of a donut, and so forth.
Either way, it's one huge mass of cycles, lots of redundancy, lots of routing choices. If you cut a connection it doesn't matter much; naturally if one user bogs a connection (or chain of connections) down with a heavy load, it's practically like it's been cut. Hypernetworks give you the freedom to route around traffic jams, and the regular structure (cube or torus) simplifies the routing over an unstructured network of random connections.
This shouldn't be too hard (at least it doesn't look too hard sitting here on a Sunday morning, half way through my first cup of liquid brains). The key is to note that we can't (and therefore needn't bother trying to) enforce the topology at all times. Instead, we just want to bias the network towards the desired form. For example:
This should at least be functional; no doubt there are a number of clever hacks that could be made...-- MarkusQ
Exactly.
Moreover, what are we to make of this?
The dominant constraint for hardware implementations of high-dimensional networks is the cost of the physical wires on the interconnect backplane. Since the hypernets discussed here would be implemented in software, no such constraints would prevent reaching the desired level of scalability.
I really don't see how you can sweep the actual physical infrastrucure under the rug like this. Eventually, virtual hypercubes turn into real packets on a real network. A network that is subject to the very same topological limitations this article discusses. Any wonder that the Tandem Himalaya architecture he mentions was implemented in hardware, rather than as a "virtual" topology implemented on top of a traditional TCP/IP network?
When it comes to complicated mathematics like this, though, intuition has often led me astray...
--Lawrence Lessig for Congress!
I can't believe Slashdot just posts this psuedomathematical nonsense without doing even elementary fact-checking
At first, I thought you were refering to your own post, and was tempted to give you a couteracting +1 Insightful.
-- MarkusQ
...to say the least. Besides the obvious algorithmic problems of establishing and/or maintaining such a topology in an environment where nodes enter and leave at such a high rate, there's a serious overhead issue. Any serious discussion of ad-hoc routing protocols (which is what this is) nowadays needs to include an analysis of the number of packets needed by the routing protocol itself, in addition to the efficiency with which "user" packets are routed. A network that always delivers user packets over an optimal path isn't really all that useful if 90% of the network's capacity is consumed by route updates. I was very disappointed to see that this particular paper attempts no such analysis of routing overhead; without it, the paper's conclusions must be regarded as highly suspect.
Slashdot - News for Herds. Stuff that Splatters.
The Internet has a structure with physical limitations!
What good does it do if your many multiply redundant connections allow transmission of messages with a fewer number of virtual hops, when every connection going out of your college dorm goes over the same physical wire. The number of connections over which a search request must travel is a liability, not an asset, when many of those connections happen to use the same physical wire. The author of this "paper" has conveniently ignored this fact, and his conclusion (that adding virtual links to your network allows you to manufacture bandwidth out of thin air) follows directly.
On a separate topic, the assertion that the virtual connectivity of Gnutella is anything like a Cayley tree is absurd, because it implies no closed paths. Consider: In order to discover and connect to a new a host on the Gnutella network, you need to catch a search request originating from that host. The fact that you were able to recieve that search request in the first place means that there was already a path between you and the remote host--therefore you have created at least one closed cycle by forming the new connection.
I have a positive modifier on Troll. When I mod someone Troll their karma should go UP!
Until Gnutella starts directing searches intelligently - towards nodes which are more likely to have the data being sought, as Freenet does, it will always be an inefficient way to search for data.
While I admit I have no clue what the article is talking about (hyper what?), I was under the impression we could already do better than linear scalability with a simple hierarchical P2P network. Just have a dynamic cluster of loosely connected "servers" to which clients maintain a persistent connection to only one. With a little intelligent routing of queries between servers (i.e. don't just spit the querey out to all the other servers you know) you get O(1) scalability at the client side and something (probably significantly) better than O(n) on the server side.
It doesn't. The problem with the tree topology used by Gnutella is that it utilizes the available bandwidth in a fashion that becomes highly inefficient as the number of nodes becomes large. At around 10^6 nodes, it uses 15-20% of the total available capacity, whereas a hypercube topology at 10^6 nodes uses essentially 100% of the available bandwidth.
So obviously, if your application runs up against the physical capacity of your underlying communications systems, you can't send more data. But via correct choices of virtual network topology, you can ensure that the physical capacity is being used productively.
Quantum mechanics: the dreams that stuff is made of.
In layman's terms. If you are a node in one of his example networks, and you're sitting on a DSL connection, does the available bandwidth you contribute to the net change whether you have 20 outbound TCP connections or 2? No. It is constant. The author incorrectly computes the "bandwidth" of these different network topologies like you are stringing a separate DSL to each person you open a TCP connection to.
Available network bandwidth in a peer network like Gnutella is related only to the physical interconnect of the nodes. (i.e. whether they are on an SBC DSL line or sitting a North American OC-3)
The only useful analysis is that which determines the amount of data-transfer required between each node (and all nodes) for common operations when using different topologies. When performing this study, you are looking for the topology which will transfer the smallest amount of data over the smallest number of nodes when performing searches. For a great analysis, see the Gnutella Performance Paper by Ritter, referenced by the above paper.
Careful analysis will tell you the same thing that common sense does -- that the best architecture involves centralized dedicated servers (supernodes), located on machines with the largest physical bandwidth available. (i.e. eactly what Napster did. )
In order to create an efficient peer network which scales, Gnutella 'merely' needs to 1) order the network by physical topology, 2) identify the nodes with the best combination of physical bandwidth, longevity, and CPU/disk resources, and 3) fully utilize those machines as supernodes.
Good luck. :)