Why Anonymized Data Isn't
Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."
I've pretty much given up any hope of being anonymous. It's just going to get exponentially more difficult as time goes on.
I had my credit card stolen once. It was stolen from the CC company. How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am? How is a credit card company supposed to function without a worldwide network which authorizes transactions.
If someone wants to find me they'll find me.
If someone wants to use my identity to frame me for a crime then they're just going to encounter a mountain of evidence from numerous sources which contradict their fabrication.
"My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.
Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it; much of it seems to come from an empty and unsatisfying life that leads to an attempt to live vicariously through some sort of idol which is perceived to be successful, in that sense that "most men lead lives of quiet desperation". However stupid and useless it may be, I can't deny that many do consider it newsworthy and much of "the news" includes such elements.
It is a miracle that curiosity survives formal education. - Einstein
Because everyone knows that EVERYONE in DC lies.
"This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Potential nitpick, but here goes.
The summary (not surprisingly for a /. summary) omits a couple of details that give the reader a rather partial picture.
For one, Paul Ohm is an Assistant Professor of law, and although the summary makes it sounds like the linked article would be from a technical perspective, (mostly) it is not.
A quote like:
"Data can either be useful or perfectly anonymous but never both."
needs a bit of background about the qualification of the person making that claim. Why? Simply because it sounds like a rather technical remark. If some computer science researcher made this claim, I would tend to take it more on the face value, otherwise I would take it with a grain of salt.
Now obviously this statement was not meant to be taken quite literally because the notion of "useful" is not precise. I can get reasonably useful information like "most of the people in my country like to buy branded stuff" or "most people who rent videos of actor X regularly, also rent the videos of actor Y regularly" without needing the underlying data to contain *any* personally identifiable information. The fact that extra data is store is a different thing.
I personally believe that instead of claiming that some researcher has argued X, it can be more informative to actually say what kind of researcher it is who made a claim. Not because only researchers in a certain area can be trusted, but because a little bit of background puts the claims in right perspective.
Yes you are. I always put put 90210. Phone number 867-5309. If anyone tries to find me, they're at least going to have that song stuck in their head and recall with disgust the shows they watched in the early 90's. Hopefully that will demoralize them enough to give up.
That Paradox ignores the year. Add that in and it starts to become harder.
'Sensible' is a curse word.