The problem here is the mean abs dev for a larger dataset can't be composed out of mean abs deviations for the smaller subsets. e.g. Having the mean abs devs for a bunch of days is useless for computing it for the month.
No, you're right. Having given it more thought, I saw that it wouldn't work because your mean can change. Too bad, though.
Compare it to a new design for a physical safe. It's claimed to be too strong to crack. Now, it may be that eventually somebody will find a weakness, but until they do, the safe is still holding strong.
There's nothing at all wrong with that. It's just a matter of what assumptions you make about what the phrase means.
It's true that the NSA might have cracked it, but it's pointless to argue about that because it's likely that nobody else will ever know... unless of course the audits find something. But the point is: so far they haven't.
"Doing two passes means I cannot update a previous answer in response to small deltas of information. If I have 2 petabytes of data and 20 megs of updates come in, I have to go back to the well and touch all 2 petabytes, I can't just update my sufficient statistics. The incompressibility of this statistic was what I was referring to with that comment. It rules out many scenarios."
I can understand why that would be annoying. But have you considered data management schemes? For example: let's say you have a known, dated dataset. You get your overall sum and number of data points, and from that calculate a mean and go back to get a sum of variances, and calculate your MAD. But then you store those values: overall sum, # data points, sum of variances. (The sums might well be very large numbers, but some languages will do arbitrary-precision math without a hiccup.)
In that case, when you add more data, you only have to go through the new data to calculate a new overall sum, number of points, and sum of variances again. From those and the old values you can calculate a new mean and new MAD.
Just a thought. But if your datasets are as large as you suggest, and getting larger, then it may be worth your while.
Your name will go on a watch list and goons the world over will give you grief for years to come
And if that's true, then tens of thousands of businessmen with perfectly legitimate encrypted data, who don't want to (or have any reason to) show it to government, would get put on lists and have their travels complicated or curtailed for years to come.
That's not just intrusive, it's asinine. The government should not be interfering with business like that. It hurts everybody.
"Mean absolute deviation requires you to sum over a bunch of absolute values of differences to a number you don't know a priori."
You are right... I did not subtract the mean, so what you say is correct: it is a "two-pass" operation. But it's still so easy to do, I wonder why that bothers you. As long as you already have the data, you don't need "as much space" as the original data more than the original data... you just re-use the original data.
Using Taleb's example (daily temperature):
First calculate the mean.
Then, you can use exactly the same algorithm I mentioned above, except that you subtract the mean from each value rather than one value from the previous value.
Yes, it's two passes (because you have to calculate the mean first), but it's dirt simple and as for storage space, the program only uses a couple of integers; there is no need to store any large data sets aside from the original.
"Only if they have probable cause to compel you to supply the password. Have you ever used Truecrypt in disk mode? You have to enter the volume password first thing after the BIOS."
Nope. Check out the recent court cases, and past Supreme Court cases. Probable cause is NOT sufficient to compel you to turn over your password. Only a court can do that, and in order to do that legally, the court has to have a great deal more evidence than mere probable cause. In fact they have to pretty much know in advance that the drive contains material that proves you broke the law.
Forcing someone to give up their password raises 5th Amendment questions. Pretty much the only time they can do that is if they ALREADY KNOW beyond reasonable doubt that something illegal is there, because in that case you would not be incriminating yourself; you are already "incriminated".
"And when the spooks turn it back on, the key gets copied into RAM again because that's part of the bootup process, and necessary if the system is to read the disk and finish booting."
No, it isn't. Have you ever actually used TrueCrypt?
When the program quits normally (or after a configurable time period), the key is GONE. It may linger in RAM for a very brief period but then it's gone. Truecrypt stores the key only in RAM, so when a machine is shut down, again the key is GONE. If your machine is on sleep or hibernate, the RAM might be preserved, but otherwise no. GP said "turn it off". Turn it off and the key is GONE.
Booting up has zero effect on this; the key is not stored anywhere on disk (unless YOU stored it somewhere on purpose, which would be dumb).
"It is like the median a very robust method, not readily influenced by outliers. The median is wickedly robust, with a breakdown point at 50%, meaning that you can throw a huge a mount of junk data at it and it still doesn't care. The arithmetic mean and the standatd deviation are both junk, often worse than the too-often-assumed-normal data thrown at it."
That depends entirely on what you are trying to show. None of them are junk for all purposes; all of them are junk for the wrong purposes.
For example, if you're talking about salaries of employees of a corporation, the mean might not mean much: the CEO makes 30 times as much as everyone else, and other managers 20 times more, lower managers 10 times more... so the mean is thrown way off. The median is much more meaningful.
On the other hand, even the mode can be useful sometimes. Suppose the corporation has only 3 pay grades: employees grade A, managers grade B, owner and CEO grade C. In that case the mode might actually tell you something interesting. That's not the best example, but it is an example.
"For the following few lines I will consider that this is true (while I think that it is false from the table you gave, but I don't think that it is changing anything on the model of the game we are debating on, correct me if I am wrong):"
Pardon me. You are correct: I did not describe the game properly. Let me start from the beginning.
We agree that there are 8 possible combinations that can occur in 3 coin flips: HHH, HHT, etc.
The way the game is played is that player A chooses one of those combinations, and player B chooses one of those combinations (presumably a different one; there would be no point in choosing the same one). Then a coin is flipped until one of those combinations appears. The winner is the person whose chosen combination appears first (at which point the game stops).
It is easy to show that as long as B chooses after A, the probabilities are non-transitive. I.e., no matter which combination is chosen by A, player B can always choose a combination that has a higher probability of appearing first. Therefore you have a truly non-transitive situation. There is no single choice A can make that does not allow B to choose another with a higher probability.
Again, since if B is knowledgeable and chooses in order to maximize his probability of winning, the minimum probability that he will win is 2/3, so it is not at all difficult to demonstrate this non-transitivity in the real world. For example, it is easy to show that in any sequence of coin flips, there is a probability of 7/8 that THH will appear before HHH. But if player A chose THH, it is again easy to show that player B has a 2/3 probability of winning by choosing TTH. And round it goes.
Clarification: you are supposed to write down your predictions first, of course. That's why they're called "predictions". THEN you flip the coin 3 times.
Maybe you need to refresh the definition of a comparison operator : Partially Ordered set. In short : you can not have an ordering comparison relation and not having transitivity for it.
Maybe you need to brush up on your inequalities. I assure you that in the real world some inequalities (such as preferences) are not transitive. I have already explained one situation in which that is true, and you can prove it for yourself.
Flip a coin 3 times. Write down the result: HHH, THT, etc. There are 8 combinations. The game is played this way: you write down your prediction. I write down mine. NO MATTER WHICH prediction you make (if of course you make it first), I can choose another combination that has at least a 2/3 probability of occurring before yours. This is non-transitive, because (if you label the combinations with letters), it means A > B > C > D > E > F > G > H > A.
In fact, here is a chart of the probabilities of one combination coming up before another.
You need not believe me; you can flip a coin a few thousand times and prove it for yourself.
Then, if you still don't believe... I have a game I'd like to play with you. $5 a round.
The upshot is this: no matter what set theory may be telling you, non-transitivity of inequalities DOES occur in real world situations, and if you refuse to believe that, you'd better not be a gambler.
For example, let's say there's normally 3 taxing locales for sake of argument.
Part of the problem is that there aren't. There are something like 13,000 different taxing districts in the U.S., each with their own tax rate and regulations.
" think that they should not say that transitivity is broken but rather that the food attractiveness function can be changed by some events, thus reordering the elements in the set Foods."
But they didn't say that, because that would have been incorrect.
Indeed, transitivity does not always hold for inequalities, and this is a known mathematical fact. The article doesn't try to change that (as you suggest), but rather acknowledges that this phenomenon, in the context of preferences, can also be rational.
Make no mistake: many situations involving inequality do not display the property of transitivity. This is not a problem with the math, nor does it mean anything is "broken". There is nothing at all mathematically wrong with this. For years I have known of a game that is played by flipping a coin 3 times. You and someone else list your predictions of the result. Due to non-transivity of inequalities, no matter what combination you choose, I can choose one that has a better chance of occurring first. In other words, A > B > C > A (although there are actually 8 combinations, not just three).
"So start a business that doesn't take a credit card and passes the 85% savings on to the consumer, and you'll have people lining up to buy from you. Oh, the problem is all the liars inflating the level of rent seeking from the credit industry, and that the convenience is worth the price charged."
It's very simple. They typical small business is charged 2.9% + $0.30 per transaction. Sure, the rate varies slightly but that's about it. If you have a thriving business and you do a lot of $$ in transactions, you might get that down to 1.9% + $0.30.
It's not a trivial amount. For a $20 purchase that's almost $0.90 and it goes up from there.
"Honestly, I've always been amazed at how the Copyright Gods balk at the mere idea of 3D printers, but don't seem to even notice 3D scanning, which is a much more important and useful tool to the everyday copyright violator."
Uh... well, I'd hardly call them "gods". Trolls, more like. Otherwise, I agree with you.
"He starts by condeming browsers and proxies that help people browse the internet anonymously. Then he jumps to saying that anonymous browsing leads to trading drugs, weapons, and pornography. Then he commends the USA NSA for spying on Americans but is concerned that now that they have been caught Americans might do something about it."
Seems to me there was something else I heard about that was anonymous, and can be traded for all kinds of illegal things.
"I've been using David3DScanner since long before 3D printing was so much as a meme..."
I agree. It's not so much of a "next step" as it is a necessary beginning step. 3D printers will never see a huge part of their potential without first having devices that will do 3D modeling of existing items.
The problem here is the mean abs dev for a larger dataset can't be composed out of mean abs deviations for the smaller subsets. e.g. Having the mean abs devs for a bunch of days is useless for computing it for the month.
No, you're right. Having given it more thought, I saw that it wouldn't work because your mean can change. Too bad, though.
"GP said 'border' there are no laws or courts at international borders."
Yes, there are. Read some of the Supreme Court decisions about precisely what laws there are at those borders.
There are no "normal" U.S. laws outside U.S. borders... that is true. But there ARE U.S. laws AT the border. And by damn, they enforce them too.
You were correct. I had it wrong the first time.
It's not bad logic.
Compare it to a new design for a physical safe. It's claimed to be too strong to crack. Now, it may be that eventually somebody will find a weakness, but until they do, the safe is still holding strong.
There's nothing at all wrong with that. It's just a matter of what assumptions you make about what the phrase means.
It's true that the NSA might have cracked it, but it's pointless to argue about that because it's likely that nobody else will ever know... unless of course the audits find something. But the point is: so far they haven't.
"Doing two passes means I cannot update a previous answer in response to small deltas of information. If I have 2 petabytes of data and 20 megs of updates come in, I have to go back to the well and touch all 2 petabytes, I can't just update my sufficient statistics. The incompressibility of this statistic was what I was referring to with that comment. It rules out many scenarios."
I can understand why that would be annoying. But have you considered data management schemes? For example: let's say you have a known, dated dataset. You get your overall sum and number of data points, and from that calculate a mean and go back to get a sum of variances, and calculate your MAD. But then you store those values: overall sum, # data points, sum of variances. (The sums might well be very large numbers, but some languages will do arbitrary-precision math without a hiccup.)
In that case, when you add more data, you only have to go through the new data to calculate a new overall sum, number of points, and sum of variances again. From those and the old values you can calculate a new mean and new MAD.
Just a thought. But if your datasets are as large as you suggest, and getting larger, then it may be worth your while.
Your name will go on a watch list and goons the world over will give you grief for years to come
And if that's true, then tens of thousands of businessmen with perfectly legitimate encrypted data, who don't want to (or have any reason to) show it to government, would get put on lists and have their travels complicated or curtailed for years to come.
That's not just intrusive, it's asinine. The government should not be interfering with business like that. It hurts everybody.
Yes, I understand that... I left out the mean. That has to be calculated first, so it is actually a "two-pass" operation.
Still very easy to do, though.
Oh, stuff it opposite your hypotenuse.
"Mean absolute deviation requires you to sum over a bunch of absolute values of differences to a number you don't know a priori."
You are right... I did not subtract the mean, so what you say is correct: it is a "two-pass" operation. But it's still so easy to do, I wonder why that bothers you. As long as you already have the data, you don't need "as much space" as the original data more than the original data... you just re-use the original data.
Using Taleb's example (daily temperature):
First calculate the mean.
Then, you can use exactly the same algorithm I mentioned above, except that you subtract the mean from each value rather than one value from the previous value.
Yes, it's two passes (because you have to calculate the mean first), but it's dirt simple and as for storage space, the program only uses a couple of integers; there is no need to store any large data sets aside from the original.
"Only if they have probable cause to compel you to supply the password. Have you ever used Truecrypt in disk mode? You have to enter the volume password first thing after the BIOS."
Nope. Check out the recent court cases, and past Supreme Court cases. Probable cause is NOT sufficient to compel you to turn over your password. Only a court can do that, and in order to do that legally, the court has to have a great deal more evidence than mere probable cause. In fact they have to pretty much know in advance that the drive contains material that proves you broke the law.
Forcing someone to give up their password raises 5th Amendment questions. Pretty much the only time they can do that is if they ALREADY KNOW beyond reasonable doubt that something illegal is there, because in that case you would not be incriminating yourself; you are already "incriminated".
"And when the spooks turn it back on, the key gets copied into RAM again because that's part of the bootup process, and necessary if the system is to read the disk and finish booting."
No, it isn't. Have you ever actually used TrueCrypt?
When the program quits normally (or after a configurable time period), the key is GONE. It may linger in RAM for a very brief period but then it's gone. Truecrypt stores the key only in RAM, so when a machine is shut down, again the key is GONE. If your machine is on sleep or hibernate, the RAM might be preserved, but otherwise no. GP said "turn it off". Turn it off and the key is GONE.
Booting up has zero effect on this; the key is not stored anywhere on disk (unless YOU stored it somewhere on purpose, which would be dumb).
"I wouldn't be claiming this until the audit is completed."
Why not? Nobody else has cracked it, so unless and until the audit is completed, it is indeed "holding strong".
"It is like the median a very robust method, not readily influenced by outliers. The median is wickedly robust, with a breakdown point at 50%, meaning that you can throw a huge a mount of junk data at it and it still doesn't care. The arithmetic mean and the standatd deviation are both junk, often worse than the too-often-assumed-normal data thrown at it."
That depends entirely on what you are trying to show. None of them are junk for all purposes; all of them are junk for the wrong purposes.
For example, if you're talking about salaries of employees of a corporation, the mean might not mean much: the CEO makes 30 times as much as everyone else, and other managers 20 times more, lower managers 10 times more... so the mean is thrown way off. The median is much more meaningful.
On the other hand, even the mode can be useful sometimes. Suppose the corporation has only 3 pay grades: employees grade A, managers grade B, owner and CEO grade C. In that case the mode might actually tell you something interesting. That's not the best example, but it is an example.
"On the other hand, you also need to use 2-pass algorithms to compute Mean Absolute Deviation"
Since when?
Pseudo-code:
1) Start with S = 0 and I = 0.
2) For each data point starting with the second:
3) Add 1 to I. S = S + absolute value of (this data point - previous data point).
4) When all the data points are collected, MAD = S / I
Where's the difficulty?
"For the following few lines I will consider that this is true (while I think that it is false from the table you gave, but I don't think that it is changing anything on the model of the game we are debating on, correct me if I am wrong) :"
Pardon me. You are correct: I did not describe the game properly. Let me start from the beginning.
We agree that there are 8 possible combinations that can occur in 3 coin flips: HHH, HHT, etc.
The way the game is played is that player A chooses one of those combinations, and player B chooses one of those combinations (presumably a different one; there would be no point in choosing the same one). Then a coin is flipped until one of those combinations appears. The winner is the person whose chosen combination appears first (at which point the game stops).
It is easy to show that as long as B chooses after A, the probabilities are non-transitive. I.e., no matter which combination is chosen by A, player B can always choose a combination that has a higher probability of appearing first. Therefore you have a truly non-transitive situation. There is no single choice A can make that does not allow B to choose another with a higher probability.
Again, since if B is knowledgeable and chooses in order to maximize his probability of winning, the minimum probability that he will win is 2/3, so it is not at all difficult to demonstrate this non-transitivity in the real world. For example, it is easy to show that in any sequence of coin flips, there is a probability of 7/8 that THH will appear before HHH. But if player A chose THH, it is again easy to show that player B has a 2/3 probability of winning by choosing TTH. And round it goes.
I thought you were trying to say that such non-transitivities are a fiction.
But instead (I think) you are saying that a new notation should be found for them?
Clarification: you are supposed to write down your predictions first, of course. That's why they're called "predictions". THEN you flip the coin 3 times.
Maybe you need to refresh the definition of a comparison operator : Partially Ordered set. In short : you can not have an ordering comparison relation and not having transitivity for it.
Maybe you need to brush up on your inequalities. I assure you that in the real world some inequalities (such as preferences) are not transitive. I have already explained one situation in which that is true, and you can prove it for yourself.
Flip a coin 3 times. Write down the result: HHH, THT, etc. There are 8 combinations. The game is played this way: you write down your prediction. I write down mine. NO MATTER WHICH prediction you make (if of course you make it first), I can choose another combination that has at least a 2/3 probability of occurring before yours. This is non-transitive, because (if you label the combinations with letters), it means A > B > C > D > E > F > G > H > A.
In fact, here is a chart of the probabilities of one combination coming up before another.
You need not believe me; you can flip a coin a few thousand times and prove it for yourself.
Then, if you still don't believe... I have a game I'd like to play with you. $5 a round.
The upshot is this: no matter what set theory may be telling you, non-transitivity of inequalities DOES occur in real world situations, and if you refuse to believe that, you'd better not be a gambler.
For example, let's say there's normally 3 taxing locales for sake of argument.
Part of the problem is that there aren't. There are something like 13,000 different taxing districts in the U.S., each with their own tax rate and regulations.
" think that they should not say that transitivity is broken but rather that the food attractiveness function can be changed by some events, thus reordering the elements in the set Foods."
But they didn't say that, because that would have been incorrect.
Indeed, transitivity does not always hold for inequalities, and this is a known mathematical fact. The article doesn't try to change that (as you suggest), but rather acknowledges that this phenomenon, in the context of preferences, can also be rational.
Make no mistake: many situations involving inequality do not display the property of transitivity. This is not a problem with the math, nor does it mean anything is "broken". There is nothing at all mathematically wrong with this. For years I have known of a game that is played by flipping a coin 3 times. You and someone else list your predictions of the result. Due to non-transivity of inequalities, no matter what combination you choose, I can choose one that has a better chance of occurring first. In other words, A > B > C > A (although there are actually 8 combinations, not just three).
Nothing is broken; the math is just fine.
"So start a business that doesn't take a credit card and passes the 85% savings on to the consumer, and you'll have people lining up to buy from you. Oh, the problem is all the liars inflating the level of rent seeking from the credit industry, and that the convenience is worth the price charged."
It's very simple. They typical small business is charged 2.9% + $0.30 per transaction. Sure, the rate varies slightly but that's about it. If you have a thriving business and you do a lot of $$ in transactions, you might get that down to 1.9% + $0.30.
It's not a trivial amount. For a $20 purchase that's almost $0.90 and it goes up from there.
I don't know if you've noticed, but "the economy" when taken as a widescale, averaged thing is doing just fine, and has been since at least 2010.
I would have to take issue with that. 0% prime interest plus QE plus huge deficits plus inflation plus low employment doesn't equal "fine".
"Honestly, I've always been amazed at how the Copyright Gods balk at the mere idea of 3D printers, but don't seem to even notice 3D scanning, which is a much more important and useful tool to the everyday copyright violator."
Uh... well, I'd hardly call them "gods". Trolls, more like. Otherwise, I agree with you.
"He starts by condeming browsers and proxies that help people browse the internet anonymously. Then he jumps to saying that anonymous browsing leads to trading drugs, weapons, and pornography. Then he commends the USA NSA for spying on Americans but is concerned that now that they have been caught Americans might do something about it."
Seems to me there was something else I heard about that was anonymous, and can be traded for all kinds of illegal things.
Oh, yeah. I remember now: cash.
"I've been using David3DScanner since long before 3D printing was so much as a meme..."
I agree. It's not so much of a "next step" as it is a necessary beginning step. 3D printers will never see a huge part of their potential without first having devices that will do 3D modeling of existing items.