There isn't much improvement to this client that will carry over to other clients. That is, this is not the second version of the BitTorrent protocol, as explained here. Bram has been pretty mum on the second version of the protocol, although the official developer forum has had some heated debates over how some of these features should be incorporated and what their parameters should be (note that, although at times heated, purely constructive;).
So if already content with Azureus or BitComet or whatever, nothing to see here... Move along folks.
Of the six people who can actually review in Firefox, four are AWOL, and one doesn't do a lot of reviews. And I'm on the verge of just walking away indefinitely, since it feels like I'm the only person who cares enough to make it an issue.
What good is people submitting patches if no one is there to review the code prior to commit? Indeed, I submitted a very trivial usability enhancement to Firefox, and it was quickly swept under the rug. Perhaps it should simply be made into a plug-in, I don't know. Just thought I would share it as first-hand experience.
Everyone on Earth who saw the original Star Wars is compelled by some unknown force to watch these god-awful movies.
If you had gone to see the first god-awful movie, you would have found out that the unknown force is actually mitochlorians!
And knowing that is the first step toward developing an antidote, right? We only have until May 19... Please say my sacrifice to learn this was not in vain!
You're wrong on the point that the second shuffling algorithm doesn't provide more random results than the first. (Note that in this context, I'm using random() to generate a random integer.) A few minutes of coding each up and looking at the resulting distributions should tell you that. (See the c2 link for these results already.) So perhaps I don't fail;)
You are right on the point that using the modulo operator can skew the distribution, and that using a random() method that returns a floating point value in [0, 1) remedies this. For array sizes whose lengths are significantly less than the max value returned by random() (again, assuming random() returns a random integer, I'll say the max value is 2^31 - 1), however, this skew is negligible. And not many arrays you will have to sort, in practice, even come close to having 2^31 - 1 elements. But alas, when we get down to the fine details, you are right -- it is suboptimal. So perhaps I do fail;)
It's the probability that 5000 times in a row, you hear some other song -- that is, one of the 4999 other songs. Calculating, we get:
(4999/5000)^5000 = 0.3678.
Almost forgot -- I thought this number looked familiar. Note that as your number of songs approaches infinity, this number approaches 1/e (approx 0.3678, as seen above). Furthermore, this bound is being approached from below:
(1/2)^2 = 0.25 (2/3)^3 = 0.296 (3/4)^4 = 0.316...
So even if you keep ripping or downloading more songs, you're not going to decrease the chance of this phenomenon. Note how likely it is to occur regardless of how many songs you have -- which explains why everyone has probably "experienced" it.
First, there is the possibility that Apple screwed up the shuffling algorithm -- although not entirely likely. If you ask an introductory programmer to write some code to shuffle an array, you'll most likely get something like this:
for i in range(array_length):
j = random() % array_length
temp = array[i]
array[i] = array[j]
array[j] = temp
This code does NOT produce all permutations with equal probability! Instead, you must use the following code:
for i in range(array_length):
j = i + (random() % (array_length - i))
temp = array[i]
array[i] = array[j]
array[j] = temp }
This was cribbed from c2 -- see the full article text here for a more informative discussion.
Second, I see a lot of people saying "I have a 20GB iPod -- and I swear sometimes it just NEVER plays this one song." Okay, let's assume that a 20GB iPod holds 5000 mp3 files. What's the probability that you play 5000 songs in shuffle mode, and never hear a particular song?
It's the probability that 5000 times in a row, you hear some other song -- that is, one of the 4999 other songs. Calculating, we get:
(4999/5000)^5000 = 0.3678.
So we have a 36% probability of this happening -- which is not a negligible amount! This will further be compounded by two things: First, you have no way of recalling exactly it has been since you heard a particular song -- if your favorite song was played 1000 songs earlier, it probably feels like 2000. If it feels like 2000, it's probably 4000. Because it's a favorite song, your mind will exaggerate the amount. It's like if you crave nicotine, it can feel like days since you've had a cigarette when it's only been hours. Second, you probably have a lot of songs you would call a "favorite" -- with each having a 36% chance of not being played over the course of 5000 plays, your mind will probably register that at least one of them is "feeling neglected."
Probability is a strange and beautiful thing. Don't expect your average audiophile to understand it. (And I'm not claiming to understand it either, beyond a very cursory level.)
Wouldn't it be more productive to study ways to combat spam? From simple Bayesian techniques to graph theoretic methods? That would teach you a lot of theory and principles you could apply to other courses as well. Right now, it just sounds like they're just doing this for attention...
Joe Somebody goes into Best Buy, is told by the sales rep "just plug it in, it will self configure, and you'll be done," goes home, and does just that. It isn't in the sellers' interest to tell about all the precautions one should take, what to watch out for, because that doesn't make a good sales pitch. But "Ahh, it's so easy to set up, anyone could do it" does.
So we can't even assume that Joe Somebody is aware that users outside his apartment, house, or network, can use his network. His neighbor's TV remote doesn't turn on his TV; his neighbor's garage door opener doesn't open his garage. Why should he assume that his neighbor's laptop can access/reach his wireless connection?
Is it his responsibility to go home and Google for all the malicious things that can happen to your wireless connection? Do you sit at home and wonder "Gee, I wonder how my neighbors can use my toaster without me knowing and put me in a legal quagmire?"
And it won't do any good to tell him to RTFM. Nobody does that anymore when it just seems to "work."
Never underestimate the ignorance of Joe Somebody. Joe Somebody might just be a straight up newbie who has more important things to tend to. Joe Sombody today might work too many hours and has too many gadgets to proactively learn how all of them work, and the 'risks' associated with each. If you want every Joe Somebody to be aware, I'd put the onus on the sellers of the device, or the manufacturers (like, a big freaking sticker on the box might help).
Blake Ross, in his blog, had some insightful commentary that I didn't see mentioned here on Slashdot:
Google's interest in Firefox shouldn't be a surprise to anyone. At the end of the day, 90+% of Google's users are accessing its service through the browser created and controlled by its largest competitor. Would you feel comfortable if customers had to walk through your competitor's shop to get to your own? This is really what Firefox is all about from a strategic standpoint, and this is what "it's just a browser!" naysayers are missing: he who owns the window to the web owns the web. When there's one porthole on the ship, everyone has to look through it. Firefox seeks to add more portholes to make sure people really understand what's going on outside.
If they're planning an entire OS to make codifying and searching your data easier, I can't see that happening anytime in the short-term. After all, awhile back there was a shoot-out of desktop search tools, and the Google Desktop Search wasn't top-ranked (yet).
As a fun aside, I found this RoShamBo (a.k.a. Rock, Paper, Scissors) Programming Competition entry that guesses what action is optimal based on Lempel-Ziv data compression. As the author explains, "there exists a duality between data compression and gambling. The basic idea is that if you have a sequence of data which you can compress well then the data must be predictable in some sense."
Anyway, try it out. In the long run, it kicks my butt. I try to make 'random' decisions, but still go below.500 -- which is interesting, because that implies that perhaps subconsciously we're always applying patterns...
DHTs work like this: Every node on the network has a 160 bit identifier. Given a key, through the DHT we can find the node whose identifier is 'closest' to the key. In Kademlia, the closeness of a node is quantified by treating its identifier XORed with the key as an unsigned integer. The node with the smallest such integer is the closest, and is therefore responsible for the key.
If you look at their readme file, they're just using the hash of the file kept in the.torrent as the key (extracted from the tracker URL, in the.torrent). So say you have a.torrent whose tracker you would like to eliminate. Just choose your node identifier, when you join the network, as either equal to the hash in the URL or close to it (such as by simply flipping one of the lower-order bits). That way, you will with near-certainty be the closest node to that hash, and thus be designated the tracker for that torrent. Now just ignore all requests from clients.
"The Night is Large" by Martin Gardner is a diamond in the rough. I haven't heard it mentioned in any geek circles, but it's definitely for people who like to know a little bit about everything, or simply muse about it -- qualities I find pretty much everywhere in geeks.
The book is a collection of Martin Gardner's essays from Scientific American and the New Yorker spanning the following topics: Physical Science, Social Science, Pseudoscience, Mathematics, The Arts, Philosophy, and Religion.
Now I know some geeks may turn their noses up at quite a few of those topics -- but Gardner hardly makes any of them boring. He has such insight, such sharp focus on every topic -- no matter how wide-ranging it may be -- he can make you feel downright ignorant. But this only compels you to read more. You feel like you're reading from a true renaissance man.
That, and the cover art is great. Search for it on Amazon.
1. Media Player? 2. Movie Maker? 3. Windows Messenger? 4. Internet Explorer?
Because, we all know that if -- after the computer is booted for the first time -- the user can't sit down, edit his home movies, download trailers from the Internet and play them, all while blabbing to his friends about Natalie Portman's hot grits, then Microsoft has failed to provide what is "necessary," eh?
In fact, though, this is the irony of the annual "American Children Falling Behind in math!" freakout -- the stories are always phrased in terms of "The US placed xth out of y countries!" with no notion of error bars, relative size of margins or any other of the statistical basics that are necessary to make the slightest sense of the results.
There is one statistical measure that gives it credence, however:
Repeatability.
The fact is, we're never in the top 20. This has been seen in study after study after study, each conducted by a different group. Don't you think, that after the nth time, we should come to realize that maybe -- just maybe -- we really aren't in the top 20, as opposed to living in denial from lack of error bars?
Uhh...
You can't multiply those two vectors together. The number of columns in the first operand have to equal the number of rows in the second.
Back to school for you, my friend.
He said known adversaries, including "intelligence services, military organizations and non-state actors," are researching information attacks against the United States.
When it comes to P2P combining anonymity and speed, you typically can't have your cake and eat it too.
I'm slowly writing my own P2P application and have investigated common approaches, as well as come up with my own. The best one i know of is a variation of mixes: When you join the network, and wish to download a file, establish an application-level tunnel through several peers to download the data. The peer on the other end of the tunnel is the one doing the downloading data for you, which is forwarded up the tunnel. The *AA sees him as the downloader, but knows he is only downloading on behalf of somebody else -- so is he culpable? Assume he is not. The only way to find you is to find your immediate neighbor along the tunnel (the first remote hop away from you). Assume that the RIAA controls this node -- then youre screwed, right?
Not quite. Assume that the tunneling policy is this: When you get a request for a tunnel to include you, with probability p you terminate the tunnel and are the downloading node (i.e., form one end of the tunnel). With probability 1-p you let the tunnel pass through you (i.e., you are an intermediate node of the tunnel) and forward the same tunneling request. The expected length of a tunnel is then 1/p, but unfortunately there is no upper bound (although probabilities of longer tunnels quickly diminish). Applying this to the situation above, if the *AA node is your first remote hop along the tunnel, he can't determine whether you are the one who is downloading the data and who created the tunnel! With probability 1-p you simply forwarded a tunnel request from another remote node. So you're guilty with probability p:)
This, however, introduces its own problems. If a peer sees you join the network, and you immediately request a tunnel through him, it's probably you requesting the file, as opposed to someone tunneling through you. That, and if any one of the links along your tunnel fail, you have to create a new tunnel. Simple statistics tells you that the likelihood of any node failing along a multi-hop tunnel is greater than the probably of a single node failing, so you can devote a lot of overhead to simply creating the tunnels. That, and it doesn't really help that the median session time of a user on a P2P network is under two minutes -- if you tunnel through a recently joined node, chances are he'll drop and you'll have to tunnel again.
So it's not really practical. But it's fun to think about.
There isn't much improvement to this client that will carry over to other clients. That is, this is not the second version of the BitTorrent protocol, as explained here. Bram has been pretty mum on the second version of the protocol, although the official developer forum has had some heated debates over how some of these features should be incorporated and what their parameters should be (note that, although at times heated, purely constructive ;).
So if already content with Azureus or BitComet or whatever, nothing to see here... Move along folks.
- shadowmatter
From the article:
Of the six people who can actually review in Firefox, four are AWOL, and one doesn't do a lot of reviews. And I'm on the verge of just walking away indefinitely, since it feels like I'm the only person who cares enough to make it an issue.
What good is people submitting patches if no one is there to review the code prior to commit? Indeed, I submitted a very trivial usability enhancement to Firefox, and it was quickly swept under the rug. Perhaps it should simply be made into a plug-in, I don't know. Just thought I would share it as first-hand experience.
- shadowmatter
Everyone on Earth who saw the original Star Wars is compelled by some unknown force to watch these god-awful movies.
If you had gone to see the first god-awful movie, you would have found out that the unknown force is actually mitochlorians!
And knowing that is the first step toward developing an antidote, right? We only have until May 19... Please say my sacrifice to learn this was not in vain!
- shadowmatter
You're wrong on the point that the second shuffling algorithm doesn't provide more random results than the first. (Note that in this context, I'm using random() to generate a random integer.) A few minutes of coding each up and looking at the resulting distributions should tell you that. (See the c2 link for these results already.) So perhaps I don't fail ;)
;)
You are right on the point that using the modulo operator can skew the distribution, and that using a random() method that returns a floating point value in [0, 1) remedies this. For array sizes whose lengths are significantly less than the max value returned by random() (again, assuming random() returns a random integer, I'll say the max value is 2^31 - 1), however, this skew is negligible. And not many arrays you will have to sort, in practice, even come close to having 2^31 - 1 elements. But alas, when we get down to the fine details, you are right -- it is suboptimal. So perhaps I do fail
- shadowmatter
It's the probability that 5000 times in a row, you hear some other song -- that is, one of the 4999 other songs. Calculating, we get:
(4999/5000)^5000 = 0.3678.
Almost forgot -- I thought this number looked familiar. Note that as your number of songs approaches infinity, this number approaches 1/e (approx 0.3678, as seen above). Furthermore, this bound is being approached from below:
(1/2)^2 = 0.25
(2/3)^3 = 0.296
(3/4)^4 = 0.316
So even if you keep ripping or downloading more songs, you're not going to decrease the chance of this phenomenon. Note how likely it is to occur regardless of how many songs you have -- which explains why everyone has probably "experienced" it.
See, probability really is beautiful
- shadowmatter
First, there is the possibility that Apple screwed up the shuffling algorithm -- although not entirely likely. If you ask an introductory programmer to write some code to shuffle an array, you'll most likely get something like this:
for i in range(array_length):
j = random() % array_length
temp = array[i]
array[i] = array[j]
array[j] = temp
This code does NOT produce all permutations with equal probability! Instead, you must use the following code:
for i in range(array_length):
j = i + (random() % (array_length - i))
temp = array[i]
array[i] = array[j]
array[j] = temp
}
This was cribbed from c2 -- see the full article text here for a more informative discussion.
Second, I see a lot of people saying "I have a 20GB iPod -- and I swear sometimes it just NEVER plays this one song." Okay, let's assume that a 20GB iPod holds 5000 mp3 files. What's the probability that you play 5000 songs in shuffle mode, and never hear a particular song?
It's the probability that 5000 times in a row, you hear some other song -- that is, one of the 4999 other songs. Calculating, we get:
(4999/5000)^5000 = 0.3678.
So we have a 36% probability of this happening -- which is not a negligible amount! This will further be compounded by two things: First, you have no way of recalling exactly it has been since you heard a particular song -- if your favorite song was played 1000 songs earlier, it probably feels like 2000. If it feels like 2000, it's probably 4000. Because it's a favorite song, your mind will exaggerate the amount. It's like if you crave nicotine, it can feel like days since you've had a cigarette when it's only been hours. Second, you probably have a lot of songs you would call a "favorite" -- with each having a 36% chance of not being played over the course of 5000 plays, your mind will probably register that at least one of them is "feeling neglected."
Probability is a strange and beautiful thing. Don't expect your average audiophile to understand it. (And I'm not claiming to understand it either, beyond a very cursory level.)
- shadowmatter
Wonder why they're so easy for Wookies? See here.
- sm
Wouldn't it be more productive to study ways to combat spam? From simple Bayesian techniques to graph theoretic methods? That would teach you a lot of theory and principles you could apply to other courses as well. Right now, it just sounds like they're just doing this for attention...
- sm
Joe Somebody goes into Best Buy, is told by the sales rep "just plug it in, it will self configure, and you'll be done," goes home, and does just that. It isn't in the sellers' interest to tell about all the precautions one should take, what to watch out for, because that doesn't make a good sales pitch. But "Ahh, it's so easy to set up, anyone could do it" does.
So we can't even assume that Joe Somebody is aware that users outside his apartment, house, or network, can use his network. His neighbor's TV remote doesn't turn on his TV; his neighbor's garage door opener doesn't open his garage. Why should he assume that his neighbor's laptop can access/reach his wireless connection?
Is it his responsibility to go home and Google for all the malicious things that can happen to your wireless connection? Do you sit at home and wonder "Gee, I wonder how my neighbors can use my toaster without me knowing and put me in a legal quagmire?"
And it won't do any good to tell him to RTFM. Nobody does that anymore when it just seems to "work."
Never underestimate the ignorance of Joe Somebody. Joe Somebody might just be a straight up newbie who has more important things to tend to. Joe Sombody today might work too many hours and has too many gadgets to proactively learn how all of them work, and the 'risks' associated with each. If you want every Joe Somebody to be aware, I'd put the onus on the sellers of the device, or the manufacturers (like, a big freaking sticker on the box might help).
- shadowmatter
It's the first chapter... If you go to Amazon.com, look up the book, then click the link "look inside this book" you can actually read about it ;)
- shadowmatter
On the other side of the coin, Longhorn will have been releas.... Oh wait, never mind. It's a one-sided coin. - sm
Blake Ross, in his blog, had some insightful commentary that I didn't see mentioned here on Slashdot:
Google's interest in Firefox shouldn't be a surprise to anyone. At the end of the day, 90+% of Google's users are accessing its service through the browser created and controlled by its largest competitor. Would you feel comfortable if customers had to walk through your competitor's shop to get to your own? This is really what Firefox is all about from a strategic standpoint, and this is what "it's just a browser!" naysayers are missing: he who owns the window to the web owns the web. When there's one porthole on the ship, everyone has to look through it. Firefox seeks to add more portholes to make sure people really understand what's going on outside.
If they're planning an entire OS to make codifying and searching your data easier, I can't see that happening anytime in the short-term. After all, awhile back there was a shoot-out of desktop search tools, and the Google Desktop Search wasn't top-ranked (yet).
- shadowmatter
As a fun aside, I found this RoShamBo (a.k.a. Rock, Paper, Scissors) Programming Competition entry that guesses what action is optimal based on Lempel-Ziv data compression. As the author explains, "there exists a duality between data compression and gambling. The basic idea is that if you have a sequence of data which you can compress well then the data must be predictable in some sense."
.500 -- which is interesting, because that implies that perhaps subconsciously we're always applying patterns...
Anyway, try it out. In the long run, it kicks my butt. I try to make 'random' decisions, but still go below
- sm
All that observation power, and it hasn't found any WMDs? Fix that, and you'll get your funding.
Best wishes,
The White House
DHTs work like this: Every node on the network has a 160 bit identifier. Given a key, through the DHT we can find the node whose identifier is 'closest' to the key. In Kademlia, the closeness of a node is quantified by treating its identifier XORed with the key as an unsigned integer. The node with the smallest such integer is the closest, and is therefore responsible for the key.
.torrent as the key (extracted from the tracker URL, in the .torrent). So say you have a .torrent whose tracker you would like to eliminate. Just choose your node identifier, when you join the network, as either equal to the hash in the URL or close to it (such as by simply flipping one of the lower-order bits). That way, you will with near-certainty be the closest node to that hash, and thus be designated the tracker for that torrent. Now just ignore all requests from clients.
If you look at their readme file, they're just using the hash of the file kept in the
It can easily be done.
- sm
"The Night is Large" by Martin Gardner is a diamond in the rough. I haven't heard it mentioned in any geek circles, but it's definitely for people who like to know a little bit about everything, or simply muse about it -- qualities I find pretty much everywhere in geeks.
The book is a collection of Martin Gardner's essays from Scientific American and the New Yorker spanning the following topics: Physical Science, Social Science, Pseudoscience, Mathematics, The Arts, Philosophy, and Religion.
Now I know some geeks may turn their noses up at quite a few of those topics -- but Gardner hardly makes any of them boring. He has such insight, such sharp focus on every topic -- no matter how wide-ranging it may be -- he can make you feel downright ignorant. But this only compels you to read more. You feel like you're reading from a true renaissance man.
That, and the cover art is great. Search for it on Amazon.
- sm
Necessary tool?
You mean like...
1. Media Player?
2. Movie Maker?
3. Windows Messenger?
4. Internet Explorer?
Because, we all know that if -- after the computer is booted for the first time -- the user can't sit down, edit his home movies, download trailers from the Internet and play them, all while blabbing to his friends about Natalie Portman's hot grits, then Microsoft has failed to provide what is "necessary," eh?
- sm
My favorite math pickup line is "Are your parents retarded? 'Cuz you look pretty special."
If you tell them that's a math pickup line, their first reaction will be to correct you: "No, there's no mention of math in that anywhere."
If you don't tell them it's a math pickup line, their first reaction will be to slap you.
Hence, I consider it a math pickup line.
- sm
Doh, it didn't even occur to me that he was thinking of the cross product -- I was thinking of simple, straightforward matrix multiplication.
My bad (and good call).
- sm
In fact, though, this is the irony of the annual "American Children Falling Behind in math!" freakout -- the stories are always phrased in terms of "The US placed xth out of y countries!" with no notion of error bars, relative size of margins or any other of the statistical basics that are necessary to make the slightest sense of the results.
There is one statistical measure that gives it credence, however:
Repeatability.
The fact is, we're never in the top 20. This has been seen in study after study after study, each conducted by a different group. Don't you think, that after the nth time, we should come to realize that maybe -- just maybe -- we really aren't in the top 20, as opposed to living in denial from lack of error bars?
- sm
Uhh... You can't multiply those two vectors together. The number of columns in the first operand have to equal the number of rows in the second. Back to school for you, my friend.
For an excellent article on the impact of rejecting foreign students to American acadamia, see here. It ran in Newsweek just last week.
(Oh, and take the title with a grain of salt.)
- shadowmatter
From TFA:
He said known adversaries, including "intelligence services, military organizations and non-state actors," are researching information attacks against the United States.
So anyone who isn't Alec Baldwin?
- sm
When it comes to P2P combining anonymity and speed, you typically can't have your cake and eat it too.
:)
I'm slowly writing my own P2P application and have investigated common approaches, as well as come up with my own. The best one i know of is a variation of mixes: When you join the network, and wish to download a file, establish an application-level tunnel through several peers to download the data. The peer on the other end of the tunnel is the one doing the downloading data for you, which is forwarded up the tunnel. The *AA sees him as the downloader, but knows he is only downloading on behalf of somebody else -- so is he culpable? Assume he is not. The only way to find you is to find your immediate neighbor along the tunnel (the first remote hop away from you). Assume that the RIAA controls this node -- then youre screwed, right?
Not quite. Assume that the tunneling policy is this: When you get a request for a tunnel to include you, with probability p you terminate the tunnel and are the downloading node (i.e., form one end of the tunnel). With probability 1-p you let the tunnel pass through you (i.e., you are an intermediate node of the tunnel) and forward the same tunneling request. The expected length of a tunnel is then 1/p, but unfortunately there is no upper bound (although probabilities of longer tunnels quickly diminish). Applying this to the situation above, if the *AA node is your first remote hop along the tunnel, he can't determine whether you are the one who is downloading the data and who created the tunnel! With probability 1-p you simply forwarded a tunnel request from another remote node. So you're guilty with probability p
This, however, introduces its own problems. If a peer sees you join the network, and you immediately request a tunnel through him, it's probably you requesting the file, as opposed to someone tunneling through you. That, and if any one of the links along your tunnel fail, you have to create a new tunnel. Simple statistics tells you that the likelihood of any node failing along a multi-hop tunnel is greater than the probably of a single node failing, so you can devote a lot of overhead to simply creating the tunnels. That, and it doesn't really help that the median session time of a user on a P2P network is under two minutes -- if you tunnel through a recently joined node, chances are he'll drop and you'll have to tunnel again.
So it's not really practical. But it's fun to think about.
- shadowmatter
The audio link of the last minute can be found here.
That is, until I bring it down with a Slashdotting.
- shadowmatter