I know. I was just saying that the routing algorithm employed in Kademlia is identical to the longest-prefix matching done by Pastry. This similarity is even cited in the technical report for Kademlia.
Why do you need to guess what it's about when it's all there in the paper linked to by the article? I've skimmed it, gotten the gist of it, and I think their technique is quite clever. And the paper seems to give full details, so anyone can implement it.
Basically, similar coding schemes make scheduling of data in a swarm easier (so there's no choking/unchoking a la BitTorrent, data just flows) and minimize the risk of a file piece being owned by only one peer (if he leaves, downloading is over). These encoding schemes, through linear combinations of pieces using XOR, combat this (I'm generalizing here). The most attractive, I think, are Rateless and Raptor codes, which have similar performance. (Incidentally, the former was developed by Petar Maymounkov, who was actually one of the inventors of Kademlia.)
Anyway, a few months ago I read the Rateless paper, and thought "Gee, I should code this and release it under the GPL... It would be great for P2P apps!" But soon after I finished its implementation, I discovered that all the ideas authored in the Rateless paper were actually covered by patents of Digital Fountain, meaning that Petar's company, Rateless, had to develop a different, proprietary coding mechanism that is outside the patents of DF, and I can't release my code!
So, getting back to my original point, the paper says, "Network coding can be seen as an extension or generalization of the Digital Fountain approach since both the server and the end-system nodes perform information encoding." Meaning that it might not be covered by DF's patents, and thus should be welcomed by the P2P community, and not immediately disregarded blindly by prejudice. I mean, if it's a 20% improvement, why not give it a chance, huh?
Microsoft Research has been working on efficient, decentralized, and fault-tolerant P2P systems since 2001. See the paper about their DHT (Distributed Hash Table) called Pastry, which was co-authored with Rice and is still under active development there. Note that the Kademlia DHT, which followed roughly a year later and is now used in a variety of P2P networks (eMule, the new decentralized BitTorrent network, etc.) employs a variant of Pastry's routing algorithm of longest prefix matching.
They still have quite a presence if you look through recent NSDI or IPTPS conferences. Note that this paper is for IEEE INFOCOM, which is big.
With all the features being removed, and the release date getting pushed farther and farther back, Longhorn will end up as nothing more than an expansion pack for Duke Nukem Forever!
The Golem I last year finished fourth, travelling 5.2 miles. It had the lowest budget of only $35,000 dollars (whereas some other teams' have a reputed budget of over a million...). And based on this image here, what I believe makes it uber-awesome is that they are cheating the competition by installing an elf under the hood and letting him drive.
Nodes randomly generate either 128 or 160 bit node identifiers. An identifier uniquely identifies a node on the network. Traditionally, they are computed as just the MD5 or SHA-1 hash of your IP address (this is to make it harder for clients to select exactly what identifier they want, which could help them target certain files for takedown... more on that later).
In Kademlia, the idea is that messages routed through the network are identified by a message key. This is, as well, either a 128 or 160 bit value. The goal of Kademlia, and every other DHT (Google for Chord, CAN, Pastry, etc.) is to route a message to the node whose identifier is "closest" to the message key. In Kademlia, the distance between a node identifier and another node identifier, or a node identifier and a message key, is computed by simply XORing the two and treating the result as an unsigned integer.
Each node maintains (roughly) a routing table containing nodes that match successively-longer high order bits with itself. For example, node 0100... maintains an entry to a node starting with 1..., a node starting with 00..., a node starting with 011..., and a node starting with 0101... Note that in terms of distance by XOR, the first node has a distance of 1..., the second with a distance of 01..., and so forth. Thus, nodes matching more high order bits are closer to you in the identifier space.
So if you are node 1010... and you receive a message starting with 0111... You should have some node in your routing table that differs in the highest-order bit, that is, it starts with 0... Say its node identifier starts with 0000. You route the message to that node. If you compute the XOR between your node identifier and the key, and this node's identifier and the key, you will see that this node is approximately twice as close to the key as you are.
Now this node differs in the second bit: 0000 vs 0111. In its routing table, it must have some node that matches in the first bit, and differs in the second: that is, starting with 01... If the message is routed to that node, we again cut our distance to the key by approximately 1/2. This process repeats until we find the node "closest" to the message key.
Routing in this manner takes log(N) time, and each node on the network maintains log(N) connectivity. Note that there are well-established algorithms for nodes joining and leaving the network, of which the former takes log(N) time as well.
So how does BitTorrent fit in? Here's what I'm assuming: Each.torrent file has a 160-bit info hash embedded in it, derived from SHA-1. Now substitute the message above for the.torrent file, and the message key for this info hash -- you are now routing.torrent files to their closest nodes. These nodes, in turn, can be the tracker. If a node knows the 160-bit info hash of a.torrent, it can find a tracker by placing this hash as the message key in a lookup message and finding the closest node, which must necessarily be the tracker.
You can do other neat tricks, too, like keyword searching, load balancing, and whatnot (see eMule -- it uses the Kademlia DHT for its serverless system). Other DHTs work in a similar manner. I'm a little confused as to why everyone uses Kademlia, when there are better ones out there. (Accordian, for example, is truly state-of-the-art.)
Plenty of resources on DHTs can be found at Project Iris.
Over the course of 28 years, those films and their modern counterparts, The Phantom Menace from 1999 and Attack of the Clones from 2002, have grossed $5.67 billion globally when adjusted for inflation. Assuming an average ticket price of $6.25, that would buy more than 907 million tickets to Revenge of the Sith--enough for every person in the Western Hemisphere, with the entire population of Poland included twice.
Since I haven't gotten any e-mail from you recently, I just wanted to let you know that we think that the *nix machines don't require a reboot thanks to some magical code they stole from SCO. And by "we think," I mean "I think." And by "I think," I mean I have no fucking clue what's going on.
On a side note, where did you get that snazzy new iPod?
In science it often happens that scientists say, 'You know that's a really good argument; my position is mistaken,' and then they actually change their minds and you never hear that old view from them again. They really do it. It doesn't happen as often as it should, because scientists are human and change is sometimes painful. But it happens every day. I cannot recall the last time something like that happened in politics or religion. - Carl Sagan, 1987 CSICOP keynote address
... just put a little "Did you know?"-like tooltip at the bottom of the main search page? (Much like the default Firefox page.) It doesn't even have to appear every visit -- just 1 out of 5 times, 1 out of 10 times, etc. Regardless, the average web user will visit Google often enough to pick up on some new tricks. Making these features well known will only reinforce people coming back to Google.
I've known about this Google feature for awhile now. The calculator feature is also very useful. So useful you'd think regular Google users would know about it by now, but every time I use it in front of someone, they always seem suprised and I end up teaching something new.
Yeah right. Aside from using it to test some pretty fancy high speed protocols, the Internet2 in general is really nothing more than a fast pipe for college students to download music on, insulated from the original Internet by BGP. You never see an academic conference requiring "tests on the Internet2" because its geographic concentration is entirely in North America and its speed is totally beyond anything you see in the real Internet; that is why everyone wants PlanetLab instead.
"I certainly wouldn't tell my students to use it if I was a professor."
Actually, the correct grammar is actually "if I were a professor." You use "were" instead of "was" in the case of hypotheticals. See Elements of Style by Strunk & White. (I had a wonderful English teacher in HS who practically had us learn every "advanced" grammar rule in that book -- who/whom, that/which, etc. Although it was painful then, I'm now glad I suffered through it.)
I'm not trying to be a grammar Nazi or anything, I'm just throwing that one out there because it's rather esoteric. Interestingly enough, my copy of Word in Office XP doesn't pick this one up.
The patterns revealed by History Flow Visualization show such information as spacing by date; occurrances of vandalism; authorship; growth; and persistence.
It seems like a good tool for inspecting the history of a document at-a-glance, but you're right -- for more details, there is no substitute for a commit log.
Could be useful, however, in environments such as CVS or Subversion across sets of files... Hmmm.
I heard awhile back that the next version of IE would be written in C# -- thus eliminating exploits caused by buffer overruns. (I further heard that a lot of Longhorn apps would be rewritten in this fashion.) Does anyone know if this will indeed be the case? Perhaps that's what they tout as 'security' (but real security will be when they throw out ActiveX).
... those selected will only earn $2250 over the summer. But remember, having Google on your resume looks better than McDonalds! Have a nice day!
- sm
I know. I was just saying that the routing algorithm employed in Kademlia is identical to the longest-prefix matching done by Pastry. This similarity is even cited in the technical report for Kademlia.
- shadowmatter
Why do you need to guess what it's about when it's all there in the paper linked to by the article? I've skimmed it, gotten the gist of it, and I think their technique is quite clever. And the paper seems to give full details, so anyone can implement it.
Basically, similar coding schemes make scheduling of data in a swarm easier (so there's no choking/unchoking a la BitTorrent, data just flows) and minimize the risk of a file piece being owned by only one peer (if he leaves, downloading is over). These encoding schemes, through linear combinations of pieces using XOR, combat this (I'm generalizing here). The most attractive, I think, are Rateless and Raptor codes, which have similar performance. (Incidentally, the former was developed by Petar Maymounkov, who was actually one of the inventors of Kademlia.)
Anyway, a few months ago I read the Rateless paper, and thought "Gee, I should code this and release it under the GPL... It would be great for P2P apps!" But soon after I finished its implementation, I discovered that all the ideas authored in the Rateless paper were actually covered by patents of Digital Fountain, meaning that Petar's company, Rateless, had to develop a different, proprietary coding mechanism that is outside the patents of DF, and I can't release my code!
So, getting back to my original point, the paper says, "Network coding can be seen as an extension or generalization of the Digital Fountain approach since both the server and the end-system nodes perform information encoding." Meaning that it might not be covered by DF's patents, and thus should be welcomed by the P2P community, and not immediately disregarded blindly by prejudice. I mean, if it's a 20% improvement, why not give it a chance, huh?
- shadowmatter
Microsoft Research has been working on efficient, decentralized, and fault-tolerant P2P systems since 2001. See the paper about their DHT (Distributed Hash Table) called Pastry, which was co-authored with Rice and is still under active development there. Note that the Kademlia DHT, which followed roughly a year later and is now used in a variety of P2P networks (eMule, the new decentralized BitTorrent network, etc.) employs a variant of Pastry's routing algorithm of longest prefix matching.
They still have quite a presence if you look through recent NSDI or IPTPS conferences. Note that this paper is for IEEE INFOCOM, which is big.
- shadowmatter
With all the features being removed, and the release date getting pushed farther and farther back, Longhorn will end up as nothing more than an expansion pack for Duke Nukem Forever!
- shadowmatter
The Golem I last year finished fourth, travelling 5.2 miles. It had the lowest budget of only $35,000 dollars (whereas some other teams' have a reputed budget of over a million...). And based on this image here, what I believe makes it uber-awesome is that they are cheating the competition by installing an elf under the hood and letting him drive.
You fool! You're not supposed to read the article!
- shadowmatter
Man, if only Bill Gates had a nickel for everytime Windows crashed, he could pay his way out...
Oh wait, he does.
- shadowmatter
Someone set up us the blog.
This is how Kademlia works:
.torrent file has a 160-bit info hash embedded in it, derived from SHA-1. Now substitute the message above for the .torrent file, and the message key for this info hash -- you are now routing .torrent files to their closest nodes. These nodes, in turn, can be the tracker. If a node knows the 160-bit info hash of a .torrent, it can find a tracker by placing this hash as the message key in a lookup message and finding the closest node, which must necessarily be the tracker.
Nodes randomly generate either 128 or 160 bit node identifiers. An identifier uniquely identifies a node on the network. Traditionally, they are computed as just the MD5 or SHA-1 hash of your IP address (this is to make it harder for clients to select exactly what identifier they want, which could help them target certain files for takedown... more on that later).
In Kademlia, the idea is that messages routed through the network are identified by a message key. This is, as well, either a 128 or 160 bit value. The goal of Kademlia, and every other DHT (Google for Chord, CAN, Pastry, etc.) is to route a message to the node whose identifier is "closest" to the message key. In Kademlia, the distance between a node identifier and another node identifier, or a node identifier and a message key, is computed by simply XORing the two and treating the result as an unsigned integer.
Each node maintains (roughly) a routing table containing nodes that match successively-longer high order bits with itself. For example, node 0100... maintains an entry to a node starting with 1..., a node starting with 00..., a node starting with 011..., and a node starting with 0101... Note that in terms of distance by XOR, the first node has a distance of 1..., the second with a distance of 01..., and so forth. Thus, nodes matching more high order bits are closer to you in the identifier space.
So if you are node 1010... and you receive a message starting with 0111... You should have some node in your routing table that differs in the highest-order bit, that is, it starts with 0... Say its node identifier starts with 0000. You route the message to that node. If you compute the XOR between your node identifier and the key, and this node's identifier and the key, you will see that this node is approximately twice as close to the key as you are.
Now this node differs in the second bit: 0000 vs 0111. In its routing table, it must have some node that matches in the first bit, and differs in the second: that is, starting with 01... If the message is routed to that node, we again cut our distance to the key by approximately 1/2. This process repeats until we find the node "closest" to the message key.
Routing in this manner takes log(N) time, and each node on the network maintains log(N) connectivity. Note that there are well-established algorithms for nodes joining and leaving the network, of which the former takes log(N) time as well.
So how does BitTorrent fit in? Here's what I'm assuming: Each
You can do other neat tricks, too, like keyword searching, load balancing, and whatnot (see eMule -- it uses the Kademlia DHT for its serverless system). Other DHTs work in a similar manner. I'm a little confused as to why everyone uses Kademlia, when there are better ones out there. (Accordian, for example, is truly state-of-the-art.)
Plenty of resources on DHTs can be found at Project Iris.
- shadowmatter
Over the course of 28 years, those films and their modern counterparts, The Phantom Menace from 1999 and Attack of the Clones from 2002, have grossed $5.67 billion globally when adjusted for inflation. Assuming an average ticket price of $6.25, that would buy more than 907 million tickets to Revenge of the Sith--enough for every person in the Western Hemisphere, with the entire population of Poland included twice.
Don't forget Poland!
- sm
From: Steve Ballmer
Subject: Missing e-mail?
Bill,
Since I haven't gotten any e-mail from you recently, I just wanted to let you know that we think that the *nix machines don't require a reboot thanks to some magical code they stole from SCO. And by "we think," I mean "I think." And by "I think," I mean I have no fucking clue what's going on.
On a side note, where did you get that snazzy new iPod?
Steve B.
The only problem is that the protocol is proprietary and only Skype knows how it works. This seems to offend a lot of people.
There's a good paper investigating how it all works here. Interesting stuff.
- shadowmatter
Especially apt:
In science it often happens that scientists say, 'You know that's a really good argument; my position is mistaken,' and then they actually change their minds and you never hear that old view from them again. They really do it. It doesn't happen as often as it should, because scientists are human and change is sometimes painful. But it happens every day. I cannot recall the last time something like that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
they at least are generally union-made
At first I read that as "urine-made" and nodded my head in agreement.
- sm
Sim City?
... just put a little "Did you know?"-like tooltip at the bottom of the main search page? (Much like the default Firefox page.) It doesn't even have to appear every visit -- just 1 out of 5 times, 1 out of 10 times, etc. Regardless, the average web user will visit Google often enough to pick up on some new tricks. Making these features well known will only reinforce people coming back to Google.
I've known about this Google feature for awhile now. The calculator feature is also very useful. So useful you'd think regular Google users would know about it by now, but every time I use it in front of someone, they always seem suprised and I end up teaching something new.
- sm
bastardizing a research network
Yeah right. Aside from using it to test some pretty fancy high speed protocols, the Internet2 in general is really nothing more than a fast pipe for college students to download music on, insulated from the original Internet by BGP. You never see an academic conference requiring "tests on the Internet2" because its geographic concentration is entirely in North America and its speed is totally beyond anything you see in the real Internet; that is why everyone wants PlanetLab instead.
- sm
... and we have "Mavis Beacon Teaches Typing Electro-Shock Edition."
Wonderful. Never will your child reach typing 60 w.p.m. faster. Or with fewer fingers.
- shadowmatter
"I certainly wouldn't tell my students to use it if I was a professor."
Actually, the correct grammar is actually "if I were a professor." You use "were" instead of "was" in the case of hypotheticals. See Elements of Style by Strunk & White. (I had a wonderful English teacher in HS who practically had us learn every "advanced" grammar rule in that book -- who/whom, that/which, etc. Although it was painful then, I'm now glad I suffered through it.)
I'm not trying to be a grammar Nazi or anything, I'm just throwing that one out there because it's rather esoteric. Interestingly enough, my copy of Word in Office XP doesn't pick this one up.
- shadowmatter
From the Overview page on alphaWorks:
The patterns revealed by History Flow Visualization show such information as spacing by date; occurrances of vandalism; authorship; growth; and persistence.
It seems like a good tool for inspecting the history of a document at-a-glance, but you're right -- for more details, there is no substitute for a commit log.
Could be useful, however, in environments such as CVS or Subversion across sets of files... Hmmm.
- shadowmatter
It seems they're always ignoring or redefining standards.
This seems to happen so much, I propose a new verb to describe it: ignorfining.
Seems especially apt in Microsoft posts.
- shadowmatter
The full rant can be found at his web page here. Scroll down to his post "But It's Over Now" in big blue letters, it's just beneath.
Good stuff.
- shadowmatter
They should have used the Open Profanity License instead!
- shadowmatter
I heard awhile back that the next version of IE would be written in C# -- thus eliminating exploits caused by buffer overruns. (I further heard that a lot of Longhorn apps would be rewritten in this fashion.) Does anyone know if this will indeed be the case? Perhaps that's what they tout as 'security' (but real security will be when they throw out ActiveX).
- shadowmatter