2. Sending queries to many thousands of peers is still a large task, even if only small packets are sent directly. Must optimize this.
Yeah, like by using distributed multicast instead of sending to every peer directly. Quite an optimization there.;-)
Sorry, coder, couldn't resist. You know I love ya, and the rest of what you said was golden. I'm just never gonna buy the bit about sending the same message through the nearest router a thousand times being better than sending it once to a neighbor on the other side.
The previous Slashdot article was about n-dimensional cube or torus topologies. Paul Harrison's "Circle" network (slashdotted - Google cache) is...wait for it...a simple circle. Sort of like Chord, it seems, but less sophisticated. It's not at all clear why a reasonable person would expect Circle to scale particularly well, especially in an environment with high node turnover (lots of potentially circle-breaking join/leave operations).
There's nothing wrong with Circle. It just doesn't seem to meet the promise of being a fully functional network that scales better than Gnutella.
It was posted on infoAnarchy before it was published on kuro5hin (1:15am EST vs. 2:25am EST). It might have been posted elsewhere, or sent via email. Someone's sure going out of their way to get publicity.
Re:Good for some, nightmare for others
on
Peek-a-Boo(ty)
·
· Score: 2
BTW, I forgot to point out that the Chinese can do exactly the same thing. In fact, I'll bet that they already do, and that open-source software makes the task easier for them.
Re:Good for some, nightmare for others
on
Peek-a-Boo(ty)
·
· Score: 2
as a Security Manager in a bank who's sometimes asked to go find out if person XYZ has been accessing nakedhairyeyebrowedcheerleaders.com, I can see how this utility might make it impossible for me to do my job
No problem. Whatever port is at the other end, the language spoken on the browser's connection will still be easily recognizable HTTP. You should already have an IDS running, and adding a signature for the "offending" HTTP traffic should be a no-brainer.
Instead of sucking up bandwidth on a chain of ten links, this topology might only require two or three hops to get to its destination.
The problem is that it's not hops in the overlay network that matter; it's hops in the underlying IP network. Your "two or three hops" in the overlay network might actually involve a dozen 33K modem links in the physical network, whereas a "less optimal" five- or six-hop route in a better-constructed overlay network might involve only eight physical nodes and nothing less than a T1 between them.
This mismatch between the overlay and physical networks is precisely what caused the famous Gnutella meltdown, as slow modem links saturated by search traffic effectively went down, leaving the network partitioned into a bunch of tiny little islands. This "hypernet" idea is topologically naive in almost exactly the same way, with predictable results. There's nothing in the proposal to prevent the creation of four-hop routes that make two complete trips around the world; such routes may appear "efficient" to a CEO or marketing guy, but to anyone who actually knows about networks it would clearly be otherwise.
IIRC, use of lasers to kill/wound/maim/blind soldiers is illegal under international law.
You're probably thinking of Protocol IV to the 1980 additions to the Geneva Convention (text at ICRC). As near as I can tell, it only applies to weapons designed to blind people. That's right, folks. You can blow people apart with laser weapons, according to international law, but you can't blind them. It is indeed a strange world we live in.
This may be a corrupt sector containing metadata (maybe even for the "/" directory or "/kernel", if you were writing a new kernel at the time of the crash), or it may be other corrupt data which became corrupted in a cascade failure that resulted in the crash after one or more corrupted blocks were written to disk.
I'll be charitable and say your comment is merely misleading. This scenario is no more a problem for soft updates than it is for journaling. The only way it could be a problem would be if you had enabled write caching on a drive that didn't maintain write order and didn't have enough reserve power to flush its write cache on power loss. Well, guess what? Take that same impossible-to-find drive, use it to store your journal instead of soft updates, and you'd be just as screwed.
Look, soft updates are a good thing, but they aren't a panacea for all problems.
Journaling is no panacea either, and it involves additional performance costs that many find unacceptable. On balance, soft updates still seem like a far better solution.
I'm not familiar with the looping issues you refer to. Can you elucidate? (Note also that this would not be so much a way to "route"
You're right. I was conflating the issue of how to establish and maintain a topology with that of how to route messages within that topology. I tend to do that, because I find that in real systems the interactions between these two supposedly-separate issues are so strong that they become inseparable.
Upon reflection, I find it hard to judge how well a system such as yours would work. On the one hand, you might run into a classic hill-climbing problem. Two nodes that "should be" adjacent to each other in a nearly-ideal hypercube might be too far separated (in terms of the underlying IP network) initially to find each other before they each settle on local maxima instead. On the other hand, it might not be good thing if they did find each other, because you might end up with a really neat hypercube at the upper level, but each "adjacency" in the hypercube is really a long multihop route across the continent at the lower level.
Trying to find a balance between these two extremes might be difficult. In fact, trying to impose a hypercube overlay-network topology on top of an IP network whose physical structure is most definitely not hypercube-like might be a fundamentally doomed idea. I don't mean to say it's not worth it to try. Only detailed simulation or even real-life deployment can truly provide the answers to these sorts of questions. I'm just saying that this particular problem domain tends to be "swampy"; things that appear promising at first run into pitfalls much further down the road, much to everyone's frustration and indignation.
A realtime os, which usually has low latency, has nothing about the duration of latency, but rather a guarantee of latency.
Exactly. There's also a corollary, which many people miss: realtime does not necessarily equate to high performance. Sometimes, you do things to enforce a bound on the worst case that actually make the average case worse. Anybody who has read Hennessy and Patterson should remember the formula for the value of an optimization (paraphrased because I don't have my copy handy):
Now consider a CPU cache. What a lot of people forget is that there is such a thing as a cache miss penalty, because in most systems hit rates are so high that the second half of the equation above remains negligible. However, a realtime system designer has to be pessimistic and assume very low hit rates. Only accesses that can be absolutely proven to be hits - e.g. repeated access not separated by too many other accesses including those from higher-priority tasks - can be counted, and all others must be considered misses. In practice, that sort of proof is usually too much of a pain in the ass so every access is assumed to be a miss. Since cache misses are actually more expensive than uncached accesses (the miss penalty), it's not uncommon to find that a critical code path has some possibility of missing its deadline if accesses are through the cache, but it can be guaranteed to complete in time with uncached accesses. So the cache gets turned off. Obviously, performance will suck, but at least it will suck predictably and that's the more important concern in realtime. For similar reasons, realtime systems often preallocate resources that then sit idle, because they can't afford to contend for them later.
The above examples should demonstrate why realtime systems might actually perform worse than general-purpose systems. Trying to make system behavior more predictable and responsive is great, and to that end we should all welcome the low-latency and preemption patches, but treating "realtime" as some kind of mantra for "better performance" is an illusion.
While the article is interesting in the sense that it shows that efficient p2p network topolgies are possible (for suitably small definitions of efficient), actually implementing it on a network of untrusted peers could be problematic.
...to say the least. Besides the obvious algorithmic problems of establishing and/or maintaining such a topology in an environment where nodes enter and leave at such a high rate, there's a serious overhead issue. Any serious discussion of ad-hoc routing protocols (which is what this is) nowadays needs to include an analysis of the number of packets needed by the routing protocol itself, in addition to the efficiency with which "user" packets are routed. A network that always delivers user packets over an optimal path isn't really all that useful if 90% of the network's capacity is consumed by route updates. I was very disappointed to see that this particular paper attempts no such analysis of routing overhead; without it, the paper's conclusions must be regarded as highly suspect.
we just want to bias the network towards the desired form. For example...New nodes pick a random twenty+ bit ID...New nodes connect up to whoever then can find.
Except that you describe routing based on Hamming distance (which won't work because of looping issues) rather than shared prefix/suffix, this sounds a lot like Tapestry.
Nobody can really be dumb enough to think that companies will pass savings back to the consumer. Hell, even if they did, it'd make much more of a difference to shop at a discount store like Costco
Why would shopping at Costco be so great? After all, nobody could really be dumb enough to believe that Costco will pass savings back to the consumer. Right? Oh wait. There seems to be at least one person dumb enough to believe that the laws of economics work differently for Costco than for everyone else. I stand corrected.
BTW, it's amusing how you opened your post with an insult after opening the previous one with a complaint about insults. I wrote this article about people like you who believe they're above the standards they set for others.
Stores want your buying habits linked to your identity so that they can sell them to more unscrupulous marketers who'll do things that even poor Joe Average Consumer and his friend Mr Sixpack would care about.
Your last paragraph embodied the fallacy of inconsistency. This time you appear to've decided on the complex-question fallacy instead. Yes, stores can use purchase-trail information in unsavory ways. That has never been in dispute here, and "proving" it achieves nothing. What has been in dispute here is your continuing denial that the same information can also be used in ways that benefit the consumer. It's not about "is X greater than Y" but about "is Y non-zero"; check this post, and particularly the last paragraph, if you don't believe me. When all of your evasions are stripped away, you're still losing the real debate by default.
Finally, you haven't shown anything. You obviously don't understand what a proof is.
As I said to another person here on Slashdot quite recently, someone in this discussion obviously flunked Logic 101 but it's not me. Your claim is that detailed purchase-trail information is unnecessary because it provides no benefit to consumers over raw sales numbers. I described just such a benefit, thereby disproving your claim. Instead of admitting your error, you've managed a hat-trick of fallacies by moving the goalposts back to a discussion of the possible abuses of such information (which were never in dispute to begin with).
You simply obnoxiously stated your opinion and demanded that I accept it.
That's a picture-perfect description of what you have done. You're the one with the unfulfilled burden of proof.
There are many other concerns than just the lowest price...Personally I'd rather spend an extra percent or two
How nice for you. Not everybody is like you, though. It's not "unscrupulous" to accomodate different tastes and priorities than yours. This isn't about you and your idiosyncrasies; it's about Joe Average Consumer. I've shown how detailed purchase-trail information can benefit JAC, and you've done absolutely nothing to show a corresponding cost to him. You seem so engrossed in considering how this affects WNight that you can't even see, let alone participate in, the real debate about how it affects people in general.
The papers you cite (thanks!) seem to show that UWB interference with GPS signals is quite implementation-dependent. The "5-10 years" you mention seems like plenty of time for the people working on UWB and new GPS-based services to understand and deal with the interference issues, and indeed it appears that working groups have already been formed for exactly that purpose. Is there really that much cause for alarm here? Even if so, how fair would it be to blame UWB for effects on systems that weren't yet deployed when UWB was?
Their profitability doesn't concern me. If they can't show me a direct benefit to me
I'll reiterate the point again for the slowest member of our class. A store that knows more about their customers' buying habits can serve their customers better by having the brands that those customers want available in sufficient quantity. Furthermore, less shelf space wasted with products their customers are not likely to buy translates directly into reduced cost, which allows lower prices to the consumer. These are quite real consumer benefits, not just benefits to the seller, and many here have personally experienced those benefits. You can pay your extra 2% just to prove how stupid you are, but don't try to suggest that anyone else should do likewise.
If they sell 100 boxes of cereal X and 10 of cereal Y, they could stop making as much of cereal Y.
Do you really believe that's as useful to them as the sort of buying-habit information they collect now? Plain statistics like that don't give them information about cross-product preferences, such as whether people who buy graham-cracker pie crusts are more likely to buy cheesecake ingredients or key-lime-pie ingredients as well. This information helps them serve not only customers who shop at one location, but also customers who move to a house/apartment near a different store and take their entire "portfolio" of purchasing preferences with them. Stocking store shelves is a problem very similar to data prefetch in a computer system, with many "hidden correlations" that can be used to improve performance if the right information is available. Take away the information, and you take away the performance benefit.
There's a serious debate to be had about whether the benefit to customers of having such information available outweighs the privacy cost, but trying to deny that there's any benefit at all is typically trollish of you.
What about Freenet (or similar P2P system) AS a replacement for Sourceforge.
An interesting idea. I would say, though, that the reason those projects are not already "eating their own dogfood" is that they don't support the semantics necessary for collaborative development. I'll use Freenet as an example because it was already mentioned in this subthread. As I see it, there are a few major obstacles to using Freenet itself for this:
Freenet does not guarantee data availability. As the Freenetistas insist on hearing every time this issue is raised, the lack of such a guarantee is not such a bad thing in the context of Freenet having been designed as a publication system rather than a permanent or archival data store. However, in the context of a collaborative development environment it's entirely unacceptable to have files or bug reports or whole projects dropping out of cache just because other projects are more popular.
In a development environment you most definitely want to know who's doing what, and limit actions (e.g. checkins) based on identity. This is exactly contrary to Freenet's design goal of assuring anonymity.
Freenet does not maintain the structure of something like a source tree. While I'm sure a layer could be added to do this, it doesn't exist in suitable form right now.
Freenet itself doesn't exist in suitable form right now. It's at version 0.4 moving toward 0.5; this is a fine state for a project to be in, but most people don't want their source-code repository based on 0.4 software.
This is, again, not to pick on Freenet specifically. Some or all of the above concerns would also arise with every other "P2P" or filesharing network you could name. Great ideas, in many cases, but at this point in time not really suitable as a basis for a source-code repository.
Wow, is my face red! I didn't even check the by-line, despite the fact that I should be paying extra attention when "Berkeley computer scientists" are mentioned. And then I didn't even put OceanStore at the top of my reference list. Damn, I'm stupid.
You're the one who attributed to me a post holding up Gnutella as an example of how to build a scalable network, even though I've expressed opinions contrary to that view often and as recently as yesterday. How logical is that, Grasshopper? Someone in this conversation obviously flunked Logic 101, all right, but not me.
Randomly? No. Predictably? Preventably? No, and no. Is "random" vs. "unpredictable and unpreventable" a useful distinction in this context? For the hundredth time, no.
BTW, I've been scarcely less critical of Gnutella than of Freenet in the past. Just yesterday, in fact, I posted a comment on this very site referring to Gnutella as an "unusually naive" protocol. If I were to propose alternatives to Freenet, you can bet I'd be pointing in a different direction than that.
Yeah, like by using distributed multicast instead of sending to every peer directly. Quite an optimization there. ;-)
Sorry, coder, couldn't resist. You know I love ya, and the rest of what you said was golden. I'm just never gonna buy the bit about sending the same message through the nearest router a thousand times being better than sending it once to a neighbor on the other side.
The previous Slashdot article was about n-dimensional cube or torus topologies. Paul Harrison's "Circle" network (slashdotted - Google cache) is...wait for it...a simple circle. Sort of like Chord, it seems, but less sophisticated. It's not at all clear why a reasonable person would expect Circle to scale particularly well, especially in an environment with high node turnover (lots of potentially circle-breaking join/leave operations).
There's nothing wrong with Circle. It just doesn't seem to meet the promise of being a fully functional network that scales better than Gnutella.
It was posted on infoAnarchy before it was published on kuro5hin (1:15am EST vs. 2:25am EST). It might have been posted elsewhere, or sent via email. Someone's sure going out of their way to get publicity.
BTW, I forgot to point out that the Chinese can do exactly the same thing. In fact, I'll bet that they already do, and that open-source software makes the task easier for them.
No problem. Whatever port is at the other end, the language spoken on the browser's connection will still be easily recognizable HTTP. You should already have an IDS running, and adding a signature for the "offending" HTTP traffic should be a no-brainer.
The problem is that it's not hops in the overlay network that matter; it's hops in the underlying IP network. Your "two or three hops" in the overlay network might actually involve a dozen 33K modem links in the physical network, whereas a "less optimal" five- or six-hop route in a better-constructed overlay network might involve only eight physical nodes and nothing less than a T1 between them.
This mismatch between the overlay and physical networks is precisely what caused the famous Gnutella meltdown, as slow modem links saturated by search traffic effectively went down, leaving the network partitioned into a bunch of tiny little islands. This "hypernet" idea is topologically naive in almost exactly the same way, with predictable results. There's nothing in the proposal to prevent the creation of four-hop routes that make two complete trips around the world; such routes may appear "efficient" to a CEO or marketing guy, but to anyone who actually knows about networks it would clearly be otherwise.
You're probably thinking of Protocol IV to the 1980 additions to the Geneva Convention (text at ICRC). As near as I can tell, it only applies to weapons designed to blind people. That's right, folks. You can blow people apart with laser weapons, according to international law, but you can't blind them. It is indeed a strange world we live in.
I'll be charitable and say your comment is merely misleading. This scenario is no more a problem for soft updates than it is for journaling. The only way it could be a problem would be if you had enabled write caching on a drive that didn't maintain write order and didn't have enough reserve power to flush its write cache on power loss. Well, guess what? Take that same impossible-to-find drive, use it to store your journal instead of soft updates, and you'd be just as screwed.
Journaling is no panacea either, and it involves additional performance costs that many find unacceptable. On balance, soft updates still seem like a far better solution.
You're right. I was conflating the issue of how to establish and maintain a topology with that of how to route messages within that topology. I tend to do that, because I find that in real systems the interactions between these two supposedly-separate issues are so strong that they become inseparable.
Upon reflection, I find it hard to judge how well a system such as yours would work. On the one hand, you might run into a classic hill-climbing problem. Two nodes that "should be" adjacent to each other in a nearly-ideal hypercube might be too far separated (in terms of the underlying IP network) initially to find each other before they each settle on local maxima instead. On the other hand, it might not be good thing if they did find each other, because you might end up with a really neat hypercube at the upper level, but each "adjacency" in the hypercube is really a long multihop route across the continent at the lower level.
Trying to find a balance between these two extremes might be difficult. In fact, trying to impose a hypercube overlay-network topology on top of an IP network whose physical structure is most definitely not hypercube-like might be a fundamentally doomed idea. I don't mean to say it's not worth it to try. Only detailed simulation or even real-life deployment can truly provide the answers to these sorts of questions. I'm just saying that this particular problem domain tends to be "swampy"; things that appear promising at first run into pitfalls much further down the road, much to everyone's frustration and indignation.
Exactly. There's also a corollary, which many people miss: realtime does not necessarily equate to high performance. Sometimes, you do things to enforce a bound on the worst case that actually make the average case worse. Anybody who has read Hennessy and Patterson should remember the formula for the value of an optimization (paraphrased because I don't have my copy handy):
Now consider a CPU cache. What a lot of people forget is that there is such a thing as a cache miss penalty, because in most systems hit rates are so high that the second half of the equation above remains negligible. However, a realtime system designer has to be pessimistic and assume very low hit rates. Only accesses that can be absolutely proven to be hits - e.g. repeated access not separated by too many other accesses including those from higher-priority tasks - can be counted, and all others must be considered misses. In practice, that sort of proof is usually too much of a pain in the ass so every access is assumed to be a miss. Since cache misses are actually more expensive than uncached accesses (the miss penalty), it's not uncommon to find that a critical code path has some possibility of missing its deadline if accesses are through the cache, but it can be guaranteed to complete in time with uncached accesses. So the cache gets turned off. Obviously, performance will suck, but at least it will suck predictably and that's the more important concern in realtime. For similar reasons, realtime systems often preallocate resources that then sit idle, because they can't afford to contend for them later.
The above examples should demonstrate why realtime systems might actually perform worse than general-purpose systems. Trying to make system behavior more predictable and responsive is great, and to that end we should all welcome the low-latency and preemption patches, but treating "realtime" as some kind of mantra for "better performance" is an illusion.
...to say the least. Besides the obvious algorithmic problems of establishing and/or maintaining such a topology in an environment where nodes enter and leave at such a high rate, there's a serious overhead issue. Any serious discussion of ad-hoc routing protocols (which is what this is) nowadays needs to include an analysis of the number of packets needed by the routing protocol itself, in addition to the efficiency with which "user" packets are routed. A network that always delivers user packets over an optimal path isn't really all that useful if 90% of the network's capacity is consumed by route updates. I was very disappointed to see that this particular paper attempts no such analysis of routing overhead; without it, the paper's conclusions must be regarded as highly suspect.
Except that you describe routing based on Hamming distance (which won't work because of looping issues) rather than shared prefix/suffix, this sounds a lot like Tapestry.
Why would shopping at Costco be so great? After all, nobody could really be dumb enough to believe that Costco will pass savings back to the consumer. Right? Oh wait. There seems to be at least one person dumb enough to believe that the laws of economics work differently for Costco than for everyone else. I stand corrected.
BTW, it's amusing how you opened your post with an insult after opening the previous one with a complaint about insults. I wrote this article about people like you who believe they're above the standards they set for others.
Your last paragraph embodied the fallacy of inconsistency. This time you appear to've decided on the complex-question fallacy instead. Yes, stores can use purchase-trail information in unsavory ways. That has never been in dispute here, and "proving" it achieves nothing. What has been in dispute here is your continuing denial that the same information can also be used in ways that benefit the consumer. It's not about "is X greater than Y" but about "is Y non-zero"; check this post, and particularly the last paragraph, if you don't believe me. When all of your evasions are stripped away, you're still losing the real debate by default.
As I said to another person here on Slashdot quite recently, someone in this discussion obviously flunked Logic 101 but it's not me. Your claim is that detailed purchase-trail information is unnecessary because it provides no benefit to consumers over raw sales numbers. I described just such a benefit, thereby disproving your claim. Instead of admitting your error, you've managed a hat-trick of fallacies by moving the goalposts back to a discussion of the possible abuses of such information (which were never in dispute to begin with).
That's a picture-perfect description of what you have done. You're the one with the unfulfilled burden of proof.
Unlike you, I suppose. Yeah right.
How nice for you. Not everybody is like you, though. It's not "unscrupulous" to accomodate different tastes and priorities than yours. This isn't about you and your idiosyncrasies; it's about Joe Average Consumer. I've shown how detailed purchase-trail information can benefit JAC, and you've done absolutely nothing to show a corresponding cost to him. You seem so engrossed in considering how this affects WNight that you can't even see, let alone participate in, the real debate about how it affects people in general.
The papers you cite (thanks!) seem to show that UWB interference with GPS signals is quite implementation-dependent. The "5-10 years" you mention seems like plenty of time for the people working on UWB and new GPS-based services to understand and deal with the interference issues, and indeed it appears that working groups have already been formed for exactly that purpose. Is there really that much cause for alarm here? Even if so, how fair would it be to blame UWB for effects on systems that weren't yet deployed when UWB was?
I'll reiterate the point again for the slowest member of our class. A store that knows more about their customers' buying habits can serve their customers better by having the brands that those customers want available in sufficient quantity. Furthermore, less shelf space wasted with products their customers are not likely to buy translates directly into reduced cost, which allows lower prices to the consumer. These are quite real consumer benefits, not just benefits to the seller, and many here have personally experienced those benefits. You can pay your extra 2% just to prove how stupid you are, but don't try to suggest that anyone else should do likewise.
Do you really believe that's as useful to them as the sort of buying-habit information they collect now? Plain statistics like that don't give them information about cross-product preferences, such as whether people who buy graham-cracker pie crusts are more likely to buy cheesecake ingredients or key-lime-pie ingredients as well. This information helps them serve not only customers who shop at one location, but also customers who move to a house/apartment near a different store and take their entire "portfolio" of purchasing preferences with them. Stocking store shelves is a problem very similar to data prefetch in a computer system, with many "hidden correlations" that can be used to improve performance if the right information is available. Take away the information, and you take away the performance benefit.
There's a serious debate to be had about whether the benefit to customers of having such information available outweighs the privacy cost, but trying to deny that there's any benefit at all is typically trollish of you.
An interesting idea. I would say, though, that the reason those projects are not already "eating their own dogfood" is that they don't support the semantics necessary for collaborative development. I'll use Freenet as an example because it was already mentioned in this subthread. As I see it, there are a few major obstacles to using Freenet itself for this:
This is, again, not to pick on Freenet specifically. Some or all of the above concerns would also arise with every other "P2P" or filesharing network you could name. Great ideas, in many cases, but at this point in time not really suitable as a basis for a source-code repository.
Wow, is my face red! I didn't even check the by-line, despite the fact that I should be paying extra attention when "Berkeley computer scientists" are mentioned. And then I didn't even put OceanStore at the top of my reference list. Damn, I'm stupid.
So...where's the wedding?
Been there, discussed that, your side lost. Here's the link.
You're the one who attributed to me a post holding up Gnutella as an example of how to build a scalable network, even though I've expressed opinions contrary to that view often and as recently as yesterday. How logical is that, Grasshopper? Someone in this conversation obviously flunked Logic 101, all right, but not me.
Randomly? No. Predictably? Preventably? No, and no. Is "random" vs. "unpredictable and unpreventable" a useful distinction in this context? For the hundredth time, no.
Nope. Just pointing out (again) why only a fool would think that earlier "data loss" post was mine.
BTW, I've been scarcely less critical of Gnutella than of Freenet in the past. Just yesterday, in fact, I posted a comment on this very site referring to Gnutella as an "unusually naive" protocol. If I were to propose alternatives to Freenet, you can bet I'd be pointing in a different direction than that.