Couldn't agree more. Just consider the jitter from your hand. If I shine a laser pointer at a wall 20ft away the dot jitters about plus or minus half an inch. At 1000 yards that's plus or minus 6 feet. Pretty much impossible to hold it steady enough so that the pilot would even notice.
Not sure this work really talks about RIAA. I don't think the RIAA estimates were done from self-report surveys, but they're still just made-up numbers. It seems to be the rule in anything related to cyber-foo that you make up loss estimates, and nobody questions them so long as a) they're big and b) bigger than last year's numbers and c) you use them to claim a "growing crisis."
I think we might have a difference in understanding in what "outlier" means. An outlier isn't a data point that is shown to be incorrect; it's a data point that is numerically distant from the rest of the points in a set. The difficulty with this data set is that it's not just the extraordinarily high values that are incorrect, but that the statistically-average values are under suspicion as well. There might very well be one large company who actually did lose $30 million due to a security breach, and 100 small companies who reported losing $25,000 when they actually lost something closer to $2000. The problem is that the incorrect values aren't outliers; there's a whole bunch of them, so they don't look statistically different from the rest of the data.
No, I think we're on the same page as to what constitutes outlier. The point the paper makes is that for some surveys 75% of the average comes from an outlier or two. This is exactly the case with the 2007 ID theft survey they mention in the intro: the answers from 2 people (in a survey of over 4000) made a 3x difference in the average (and were found to be fabricated). It's quite possible that some of the non-outlier answers were fabricated also, but they don't have the same influence on the estimate.
Cant you just exclude the outliers from the analysis?
It depends on whether the outlier data is correct. If you're surveying wealth and some guy claims to be worth $50 billion, you need to figure out if he's telling the truth or not. Outliers have a huge effect on the average, that's the point of the sex-survey. The average number of partners reported by men is 5x higher than reported by women. But if you throw out the outliers among the men the averages are almost the same.
Point of the paper is that in cyber-crime surveys they never even examine outlier results carefully.
It's well enough established that men claim to have more female sexual partners in sex surveys than women claim male partners, a discrepancy that can't be explained by sampling error alone.
That can be explained by a few women I know. They can take on three men at a time. So unless you correct the survey for them, the numbers won't match.
No, it can't. Suppose one woman sleeps with 100 guys. One woman increased her count by 100, and 100 guys increased their count by 1 each. The average number of heterosexual sex-partners that men and women have had is the same. Do you need me to draw you a diagram?
Cause of Gawker and Rockyou leaks: compromised servers.
Total accts compromised because of security pros: >= 32 + 1.2 million.
Total accts compromised because of users: X.
Before we launch yet another round of blame the user don't we need to show that X is greater than 33.2 million?
This topic of passwords keeps coming up. Different people keep piping in with "the REAL problem with passwords is........" and
the solution is PKI/OpenID/keepass/1password/phone auth/securID etc etc etc.
My impression is that we are making no progress whatever. We can't even agree on what the main problems are (keylogging, user forgetting, phishing, brute-forcing etc). With a 100 slashdotters posting you get 100 different offered solutions. So my guess is that 5 years from now, and probably 10 we're stuck exactly where we are today.
Anyone disagree?
Second part of the article is more interesting than scheme they talk about. ~~~~~~~~~
"Florencio and Herley found that the sites that had the most stringent password requirements were those where the users generally had no ability to shop around--sites like the U.S. Social Security Administration, the National Weather Service, and the webmail systems for several large universities. For these systems, the organizations have no monetary incentive to balance usability with security, or to find some other way of protecting user accounts."
That doesn't mean *nobody* pays the cost of the fraud. We all pay those costs, indirectly.
But isn't that the point? Isn't it rational of users to shirk individual effort that reduces collective harm? For sure, Wellsfargo passes the cost to its customers. But that happens whether an individual user makes security effort or not. So might as well not.
Couldn't agree more. Just consider the jitter from your hand. If I shine a laser pointer at a wall 20ft away the dot jitters about plus or minus half an inch. At 1000 yards that's plus or minus 6 feet. Pretty much impossible to hold it steady enough so that the pilot would even notice.
Not sure this work really talks about RIAA. I don't think the RIAA estimates were done from self-report surveys, but they're still just made-up numbers. It seems to be the rule in anything related to cyber-foo that you make up loss estimates, and nobody questions them so long as a) they're big and b) bigger than last year's numbers and c) you use them to claim a "growing crisis."
It can't be a bell curve, since the number can't be less than zero. Can be approximately a bell curve either, since it definitely isn't symmetric.
I think we might have a difference in understanding in what "outlier" means. An outlier isn't a data point that is shown to be incorrect; it's a data point that is numerically distant from the rest of the points in a set. The difficulty with this data set is that it's not just the extraordinarily high values that are incorrect, but that the statistically-average values are under suspicion as well. There might very well be one large company who actually did lose $30 million due to a security breach, and 100 small companies who reported losing $25,000 when they actually lost something closer to $2000. The problem is that the incorrect values aren't outliers; there's a whole bunch of them, so they don't look statistically different from the rest of the data.
No, I think we're on the same page as to what constitutes outlier. The point the paper makes is that for some surveys 75% of the average comes from an outlier or two. This is exactly the case with the 2007 ID theft survey they mention in the intro: the answers from 2 people (in a survey of over 4000) made a 3x difference in the average (and were found to be fabricated). It's quite possible that some of the non-outlier answers were fabricated also, but they don't have the same influence on the estimate.
Cant you just exclude the outliers from the analysis?
It depends on whether the outlier data is correct. If you're surveying wealth and some guy claims to be worth $50 billion, you need to figure out if he's telling the truth or not. Outliers have a huge effect on the average, that's the point of the sex-survey. The average number of partners reported by men is 5x higher than reported by women. But if you throw out the outliers among the men the averages are almost the same. Point of the paper is that in cyber-crime surveys they never even examine outlier results carefully.
It's well enough established that men claim to have more female sexual partners in sex surveys than women claim male partners, a discrepancy that can't be explained by sampling error alone.
That can be explained by a few women I know. They can take on three men at a time. So unless you correct the survey for them, the numbers won't match.
No, it can't. Suppose one woman sleeps with 100 guys. One woman increased her count by 100, and 100 guys increased their count by 1 each. The average number of heterosexual sex-partners that men and women have had is the same. Do you need me to draw you a diagram?
Cause of Gawker and Rockyou leaks: compromised servers. Total accts compromised because of security pros: >= 32 + 1.2 million. Total accts compromised because of users: X. Before we launch yet another round of blame the user don't we need to show that X is greater than 33.2 million?
This topic of passwords keeps coming up. Different people keep piping in with "the REAL problem with passwords is........" and the solution is PKI/OpenID/keepass/1password/phone auth/securID etc etc etc. My impression is that we are making no progress whatever. We can't even agree on what the main problems are (keylogging, user forgetting, phishing, brute-forcing etc). With a 100 slashdotters posting you get 100 different offered solutions. So my guess is that 5 years from now, and probably 10 we're stuck exactly where we are today. Anyone disagree?
Second part of the article is more interesting than scheme they talk about. ~~~~~~~~~ "Florencio and Herley found that the sites that had the most stringent password requirements were those where the users generally had no ability to shop around--sites like the U.S. Social Security Administration, the National Weather Service, and the webmail systems for several large universities. For these systems, the organizations have no monetary incentive to balance usability with security, or to find some other way of protecting user accounts."
That doesn't mean *nobody* pays the cost of the fraud. We all pay those costs, indirectly. But isn't that the point? Isn't it rational of users to shirk individual effort that reduces collective harm? For sure, Wellsfargo passes the cost to its customers. But that happens whether an individual user makes security effort or not. So might as well not.