Slashdot Mirror


Why Anonymized Data Isn't

Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."

22 of 280 comments (clear)

  1. Damn voyeurism is all it is by Ethanol-fueled · · Score: 4, Insightful

    For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.

    ...And this is the first thing that the author(s) though of regarding data-mining? Okay, but how would this happen? Why go through all the trouble to gather all that data when you could just hire a P.I. or know (or bribe) a law-enforcement official or an ISP employee? It Reminds me of a conversation I had with a guy who bragged that he could get anybody's info because a very good friend of his worked at the DMV. There were a couple semi-profile firings at the State Department because some employees snooped through celebrities' records for no reason other than voyeurism..er..curiosity.

    Those types, the ones with the direct access to the info, are the weakest link. They're only human. "Hey, Bob, there's this guy I really hate. Look up his IP logs and tell me what you see!"

    It all boils down to voyeurism. People would rather bring others down before bring their own lives up. It's the nature of the beast! Pathetic.

    1. Re:Damn voyeurism is all it is by causality · · Score: 5, Insightful

      But the voyeurism slant isn't newsworthy.

      Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it; much of it seems to come from an empty and unsatisfying life that leads to an attempt to live vicariously through some sort of idol which is perceived to be successful, in that sense that "most men lead lives of quiet desperation". However stupid and useless it may be, I can't deny that many do consider it newsworthy and much of "the news" includes such elements.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    2. Re:Damn voyeurism is all it is by causality · · Score: 4, Insightful

      Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)

      I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.

      I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.

      That's the thing that I also think people don't understand. With good reason, I am not satisfied merely that someone probably wouldn't want to abuse my information. I am satisfied only when I know that they cannot do so.

      I think the solution is to have the concept of "intellectual property" work both ways. Obviously your private information has value, otherwise advertisers and other companies wouldn't go to such great lenghts to obtain and use it. The problem is that they obtain it without your consent and without directly compensating you. For example, if I don't actively block web bugs, cookies, HTTP "ping", analytics tools, and other similar attempts, then that data will be gathered whether or not I like it.

      The reason why I actively go out of my way to prevent companies from gathering data on me is simple. No one asked me if I wanted to be data-mined. I refuse to honor agreements in which I did not participate. Why anyone else would do so is a mystery to me.

      So make each individual's private data their personal property. They can set whatever value they like, and if that value is more than a company thinks it is worth, the company is free to decline the sale. Most importantly, any attempt to just take that data will be theft, and anyone who does this can be prosecuted in a criminal court. I mean, think about it: why is it "marketing" when a company helps itself to my information against my will and "piracy" or "industrial espionage" if I helped myself to THEIR zeroes and ones against their will?

      --
      It is a miracle that curiosity survives formal education. - Einstein
  2. Paul Ohm? by Yvan256 · · Score: 4, Funny

    Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both."

    Great, another Ohm's law to learn.

    1. Re:Paul Ohm? by natehoy · · Score: 4, Informative

      Nonsense, it could be a extension of the current Law:

      "In electrical circuits, Ohms' law states that the current through a conductor between two points is directly proportional to the potential difference or voltage across the two points, and inversely proportional to the resistance between them. In data anonymity, the law states that the general usefulness of any set of data that originally contained personally-identifiable information is inversely proportional to the degree of anonymity applied to said data."

      See, on simple law to memorize, and now data analysts learn just a teensy bit about electricity and EEs learn just a teensy bit about data anonymization.

      --
      "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
    2. Re:Paul Ohm? by Beardo+the+Bearded · · Score: 4, Informative

      Okay, let's take a road. The speed at which traffic can travel depends on the quality of the surface, gradient, camber, zoning, etc. Let's call this the "road conditions", with a lower number being better roads.

      The number of cars that want to get through that road is a primary unit, which we can refer to as the "volume of traffic".

      The third major criteria is the speed at which the traffic actually flows. This is the "actual flow" of traffic -- in other words, the "influence of other cars" on the traffic congestion.

      In other words:
      volume = influence of traffic * road conditions

      or:
      V = IR

      --

      ---
      ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
  3. Duh. by SatanicPuppy · · Score: 3, Informative

    Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

    I mean, seriously. They don't need to know. Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    1. Re:Duh. by ColdWetDog · · Score: 4, Funny

      I just put "No" under sex. I like to tell the truth. Not sure how it helps on the ID end though.

      --
      Faster! Faster! Faster would be better!
    2. Re:Duh. by garcia · · Score: 3, Insightful

      Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

      I use 1/1/1979 (it's closer to my real age) and 90210 instead. I get a lot of crosseyed looks and many times the cashier (or whatever human I'm dealing with) will end up entering in a local zip code instead but people are no longer arguing w/me about what I choose to provide them when pressured for information (I always politely reply, "no thanks," when asked for that type of information but will give them false shit when they ask again and whine that they'll be fired).

      Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

      Because the majority of people have absolutely no problems handing over any and all information they're prompted for up to and including their e-mail address, phone number or even SSN! Because most people don't even blink, those of us that don't feel like it should be anyone's business (like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway) are looked at like assholes when we refuse to provide information that no one really needs anyway.

    3. Re:Duh. by Anonymous Coward · · Score: 4, Funny

      I put "please!" and it doesnt seem to help either.

    4. Re:Duh. by interkin3tic · · Score: 5, Funny

      Yes you are. I always put put 90210. Phone number 867-5309. If anyone tries to find me, they're at least going to have that song stuck in their head and recall with disgust the shows they watched in the early 90's. Hopefully that will demoralize them enough to give up.

    5. Re:Duh. by compro01 · · Score: 3, Funny

      I would think 90210 is a more common choice for zip code. It's probably the most densely populated area on the planet according to dataminers.

      --
      upon the advice of my lawyer, i have no sig at this time
    6. Re:Duh. by plague3106 · · Score: 4, Funny

      I once gave a gamestop employee my zip as 12345. He say "its ok if you don't want to give it." My reply was the no, I am from Schenectady, NY.

    7. Re:Duh. by causality · · Score: 3, Funny

      And you wonder why you never get laid when you go to a bar.

      Usually it's better to wait until you leave the bar.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    8. Re:Duh. by RabidMoose · · Score: 3, Funny

      The only bar I go to is the one my parents built in their basement while I was away at college.

      I never pay for drinks, I know the password for the Wi-fi, and it never closes.

      Problem is, the only girl who ever shows up is my sister.

    9. Re:Duh. by Planesdragon · · Score: 3, Funny

      And after that, it's to keep a list of everyone who has entered the bar for the history of it's operation. Much easier to identify "troublemakers" when you have a list of people who like to have fun once in a while.

      You DO know that in many states, a bartender is legally responsible for anything you do while drunk from the moment you take a drink until you're finally sober, right?

  4. Only three bits? by Yvan256 · · Score: 4, Funny

    [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex.

    Holy hell forget about that anonymized data crap, I want to learn how she can compress that much data into three bits!

  5. Mission Impossible by im_thatoneguy · · Score: 5, Insightful

    I've pretty much given up any hope of being anonymous. It's just going to get exponentially more difficult as time goes on.

    I had my credit card stolen once. It was stolen from the CC company. How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am? How is a credit card company supposed to function without a worldwide network which authorizes transactions.

    If someone wants to find me they'll find me.

    If someone wants to use my identity to frame me for a crime then they're just going to encounter a mountain of evidence from numerous sources which contradict their fabrication.

    "My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

  6. Re:20500 by natehoy · · Score: 5, Insightful

    Because everyone knows that EVERYONE in DC lies.

    --
    "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
  7. Couple of things.. by hansraj · · Score: 5, Insightful

    Potential nitpick, but here goes.

    The summary (not surprisingly for a /. summary) omits a couple of details that give the reader a rather partial picture.

    For one, Paul Ohm is an Assistant Professor of law, and although the summary makes it sounds like the linked article would be from a technical perspective, (mostly) it is not.

    A quote like:

    "Data can either be useful or perfectly anonymous but never both."

    needs a bit of background about the qualification of the person making that claim. Why? Simply because it sounds like a rather technical remark. If some computer science researcher made this claim, I would tend to take it more on the face value, otherwise I would take it with a grain of salt.

    Now obviously this statement was not meant to be taken quite literally because the notion of "useful" is not precise. I can get reasonably useful information like "most of the people in my country like to buy branded stuff" or "most people who rent videos of actor X regularly, also rent the videos of actor Y regularly" without needing the underlying data to contain *any* personally identifiable information. The fact that extra data is store is a different thing.

    I personally believe that instead of claiming that some researcher has argued X, it can be more informative to actually say what kind of researcher it is who made a claim. Not because only researchers in a certain area can be trusted, but because a little bit of background puts the claims in right perspective.

  8. Anonymous can be useful.. by EasyTarget · · Score: 4, Insightful

    Data can either be useful or perfectly anonymous but never both

    What a load of bolaks....

    Supposing you have a list of -just- birth dates for every citizen at the census. You -only- have only been given one piece of data per person, the date, nothing more. Just a huge list of dates, sorted chronologically.
    1) The data has been totally anonymised.
    2) You can do all kinds of meaningful analysis on the age demographics of the population. And make policy decisions based on that.

    Fully anonymous data producing useful results.

    --
    "Oops, I always forget the purpose of competition is to divide people into winners and losers." - Hobbes
  9. Re:Three things? Really? by Daniel_Staal · · Score: 5, Informative

    That Paradox ignores the year. Add that in and it starts to become harder.

    --
    'Sensible' is a curse word.